View SAA7115HLV1518_745926.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

preliminary specification supersedes pnx1300 data of 2002 feb 15 file under integrated circuits, tr1 2004 aug 20 integrated circuits pnx1300 series media processors
2002 feb 15 philips semiconductors preliminary specification media processors pnx1300 series
pnx1300 series data book foreword table of contents 1 pin list 2 overview 3 dspcpu architecture 4 custom operations for multimedia 5 cache architecture 6 video in 7 enhanced video out 8 audio in 9 audio out 10 spdif out 11 pci interface 12 sdram memory system 13 system boot 14 image coprocessor 15 variable length decoder 16 i 2 c interface 17 synchronous serial interface 18 jtag functional specification 19 on-chip semaphore assist device 20 arbiter 21 power management 22 pci-xio bus functional specification a dspcpu operations b mmio register summary c endian-ness index preliminary specification ? 2001-2004 philips electronics north america corporation all rights reserved. see terms and conditions on the next page. 2004 aug 20
terms and conditions philips semiconductors and philips electronics north america corpor ation reserve the righ t to make changes, without notice, in the products, in cluding circuits, standard cells, and/or software, described or contained herein in order to improve design and/or performance. ph ilips semiconductors assumes no responsibility or liability for the use of any of these products, conveys no license or title under any patent, copyright, or most work right to these products, and makes no represent ations or warranties that these products are free from patent, copyright, or most work right infringement, unle ss otherwise specified. applications that are described herein for any of these products are for illustrative purposes on ly. philips semiconductors makes no representation or warranty th at such applications will be suitable for the specified use witho ut further testing or modification. life support applications philips semiconductors and philips electronics north am erica corporation products are not designed for use in life support appliances , devices, or systems where malfunction of a philips semicond uctors and philips electronics north america corporation product can reas onably be expected to result in a personal injury. philips semiconductors and philips el ectronics north america corporatio n customers using or selling philips semiconductors and philips electronic s north america corporat ion products for use in such applications do so at their own risk and agree to fully indemnify philips semi conductors and philips el ectronics north america corporation for any damages resulting from improper use or sale. philips semiconductors and philips electr onics north america corporation regi ster eligible circuits under the semiconductor chips protection act. ? 2001, 2002, 2003, 2004 philips electronics north america corporation all rights reserved. printed in u.s.a. business line media processing, 811 e. arques avenue, sunnyvale, ca 94088 definitions data sheet identification product status definition objective specification formative or in design this data sheet contains the design tar get or goal specifications for product development. specifications may c hange in any manner without notice. preliminary specification preproduction product this data sheet contains preliminar y data, and supplementary data will be pub- lished at a later date. philips semic onductors reserves the right to make changes at any time without notice in order to improve design and supply the best possible product. product specification full production this data sheet contains final specificat ions. philips semic onductors reserves the right to make changes at any time wi thout notice, in order to improve the design and supply the best possible product. terms and conditions
preliminary information 1 foreword the trimedia ? pnx1300 series is an enhanced version of the tm-1300 family of media processor. the pnx1300 series contains an ultra-high performance very long instruction word processor, as well as a com- plete intelligent video and au dio input/output subsystem. the processor has an instruction set that is optimized for processing audio, video and graphics. it includes power- ful simd multimedia operators for eight- and 16-bit signal datatypes as well as a full complement of 32-bit ieee compatible floating point operations. the pnx1300 series is intended as a multi-standard programmable video, audio and graphics processor. it can either be used standalone, or as an accelerator to a general purpose processor. the architecture of the trimedia family came about as the result of many years of ef fort of many dedicated indi- viduals. going back in history, the origin of trimedia was laid by the life-1 vliw processor, designed by junien labrousse and myself in 1987. work continued after- wards in philips research lab s, palo alto. my special thanks go to the entire palo alto research team: mike ang, uzi bar-gadda, peter donovan, martin freeman, eino jacobs, beomsup kim, bob law, yen lee, vijay mehra, pieter van der meulen, ross morley, mariette parekh, bill sommer, artur so rkin and pierre uszynski. the palo alto period matured the architecture?we port- ed all video and audio algorithms that we could find to the compiler/simulator and refined the operation set. in addi- tion, we learned how to give the architecture a market di- rection. in may 1994, philip s management?in particular cees-jan koomen, eddy odijk, theo claasen and doug dunn?decided to develop tr imedia into a major philips semiconductors product line. under the guidance of keith flagler, the trimedia team was built. all of them contribu ted to take this from a set of interesting ideas to a reliable and competitive product in a short period of time. the initial trimedia team includ- ed fuad abu nofal, karel all en, mike ang, robert aqui- no, manju asthana, patrick de bakker, shiv balakrish- nan, jai bannur, marc berger, sunil bhandari, rusty biesele, ahmet bindal, david blakely, hans bouw- meester, steve bowden, r obert bradfield, nancy breede, shawn brown, sujay chari, catherine chen, howen chen, yan-ming chen, yong cho, scott clapper, matthew clayson, paul coelho, richard dodds, marc duranton, darcia eding, aaro n emigh, li chi feng, keith flagler, jean gobert, sergio golombek, mike grimwood, yudi halim, hari hampapura m, carl hartshorn, judy heider, laura hrenko, jim hsu, eino jacobs, marcel janssens, patricia jones, hann-hwan ju, jayne keith, bhushan kerur, ayub khan, keith knowles, mike kong, ashok krishnamurti, yen le e, patrick leong, bill lin, laura ling, chialun lu, naeem maan, nahid mansipur, mike maynard, vijay mehra, jun mejia, derek meyer, prabir mohanty, saed muhssin, chris nelson, stephen ness, keith ngo, francis nguyen, kathleen nguyen, derek noonburg, ciaran o?donnel, sang-ju park, charles peplinski, gene pinkston, maryam pirayou, par- dha potana, bill price, vict or ramamoorthy, babu rao kandamilla, ehsan rashid, selliah rathnam, margaret redmond, donna richardson, alan rodgers, tilakray roychoudhury, hani salloum, chris salzmann, bob seltzer, ravi selvaraj, ji m shimandle, deepak singh, bill sommer, juul van der sp ek, manoj srivastava, ren- ga sundararajan, ken-sue tan, ray ton, steve tran, cynthia tripp, ching-yih tseng, allan tzeng, barbara vendelin, john vivit, rudy wang, rogier wester, wayne wonchoba, anthony wong, sara wu, david wyland, ken xie, vincent xie, bettina yeung, robert yin, charles young, grace yun, elena zelayeta and vivian zhu. expert help and feedback was received from many. in particular, i?d like to mention kees van zon of philips eindhoven for the help with f iltering-related issues, and craig clapp of picturetel for excellent feedback on all aspects of the architecture. my special thanks go to joe kostelec. he made me un- derstand that my ambition s could better be realized in california than in europe. furthermore, his vision and his wisdom are credited with keeping this project alive and growing until the ?in vestment decision.? the vision of a universal media accelerator is credited to jaap de hoog. jaap, i wish you were here to see it come to fruition. ?gerrit slavenburg after the initial tm-1000 product, the tm-1100, tm-1300 and now pnx1300 series chips have been successfully integrated in many video and audio products. it has been my pleasure to have been involved in these designs and would like to thank the people involved in tm-1300 and pnx1300 series projects under the guidande of cees hartgring and simon wegerif. the team included karel allen, tien-cheng bau, jim campbell, anitamk chan, john chang, roel coppoolse, taufik dakhil, mitch dani- il, nam dao, patrick debaum arche, thuy duong, tor- sten fink, jan grotenbreg, mohammad hafeez, feng hao, farah jubran, babu rao kandamalla, aki kaniel, yan-ling li, ying-chao liu, naeem maan, don marshal, thomas meyer, javed mukarram, long nguyen, tu nghiem, elaine outler, char les peplinski, duc t. pham, thorwald rabeler, raquel ru iz, ensieh saffari, hani salloum, wenyi song, stephen tomasello, tran tung, maria f. wangsahamidjaja, chang-ming yang, moham- med i. yousuf, hui zhang and gerrit slavenburg. - luis lucas
pnx1300/01/02/11 data book philips semiconductors 2 preliminary information
preliminary specification 3 table of contents foreword 1 pin list 1.1 pnx1300 series versus tm-1300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 boundary scan notice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.3 i/o circuit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.4 signal pin list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.5 power pin list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 1.6 pin reference voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 1.7 package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 1.8 ordering information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 1.8.1 lead parts: last time buy for thes e parts is september 30, 2005: . . . . . . . . . . . . . . . . . . . . . . 1-10 1.8.2 lead-free parts: available for ordering starting october 1, 2004: . . . . . . . . . . . . . . . . . . . . . . 1-11 1.9 parametric characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.9.1 pnx1300/01/02/11 absolute maximum ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.9.2 pnx1300/01/02 operating range and thermal characteristics . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.9.3 pnx1311 operating range and therma l characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.9.4 pnx1300/01/02/11 power supply sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 1.9.5 pnx1300/01/02 dc/ac characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13 1.9.6 pnx1311 dc/ac characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13 1.9.7 pnx1300 series power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 -14 1.9.7.1 power consumption for applicat ions on pnx1300 series . . . . . . . . . . . . . . . . . . . . . . 1-14 1.9.7.2 pnx1300/01/02 dspcpu core current and powe r consumption . . . . . . . . . . . . . . . . 1-15 1.9.7.3 pnx1311 dspcpu core curr ent and power consumption details . . . . . . . . . . . . . . . 1-15 1.9.7.4 pnx1300/01/02 current consumption for on-chip peripherals . . . . . . . . . . . . . . . . . 1-16 1.9.7.5 pnx1311 current consumption for on-chip peripherals . . . . . . . . . . . . . . . . . . . . . . 1-17 1.9.7.6 strg3, strg5 type i/o circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 8 1.9.7.7 norm3 type i/o circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18 1.9.7.8 weak5 type i/o circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18 1.9.7.9 iicod (i2c) type i/o circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18 1.9.7.10 sdram interface timing for pnx1300/01/02/11 speed grades. . . . . . . . . . . . . . . . . . 1-19 1.9.7.11 pci bus timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19 1.9.7.12 jtag i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 1.9.7.13 i2c i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 1.9.7.14 video in i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 1.9.7.15 video out i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20 1.9.7.16 audioin i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21
pnx1300/01/02/11 data book philips semiconductors 4 preliminary specification 1.9.7.17 audio out i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21 1.9.7.18 ssi i/o timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21 2 overview 2.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.2 pnx1300 fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.3 pnx1300 chip overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.4 brief examples of operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.4.1 video decompression in a pc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.4.2 video compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.5 introduction to pnx1300 blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.5.1 internal ?data highway? bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 2.5.2 vliw processor core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.5.3 video in unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.5.4 enhanced video out unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.5.5 image coprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 2.5.6 variable-length decoder (vld) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 2.5.7 audio in and audio out units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.5.8 s/pdif out unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.5.9 synchronous serial interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.5.10 i2c interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.6 new in pnx1300 (versus tm-1300) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.7 new in pnx1300 (versus tm-1100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.8 new in pnx1300 (versus tm-1000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 3 dspcpu architecture 3.1 basic architecture concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.1.1 register model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.1.2 basic dspcpu execution model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.1.3 pcsw overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.1.4 spc and dpc?source and destination program counter . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.1.5 cccount?clock cycle counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.1.6 boolean representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.1.7 integer representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.8 floating point representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.9 addressing modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.1.10 software compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.2 instruction set overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3.2.1 guarding (conditional execution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3.2.2 load and store operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3.2.3 compute operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
philips semiconductors preliminary specification 5 3.2.4 special-register operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.2.5 control-flow operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.3 pnx1300 instruction issue rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.4 memory and mmio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 3.4.1 memory map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 3.4.2 the memory hole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 3.4.3 mmio memory map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 3.5 special event handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 3.5.1 reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.5.2 exc (exceptions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.5.3 int and nmi (maskable and non-maskable interrupts) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.5.3.1 interrupt vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 3.5.3.2 interrupt modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.5.3.3 device interrupt acknowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 -10 3.5.3.4 interrupt priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.5.3.5 interrupt masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.5.3.6 software interrupts and acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.5.3.7 nmi sequentialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.5.3.8 interrupt source assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.6 pnx1300 to host interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 3.7 host to pnx1300 interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3.8 timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 3.9 debug support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3.9.1 instruction breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 3.9.2 data breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 4 custom operations for multimedia 4.1 custom operations overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1.1 custom operation motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1.2 introduction to custom operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1.3 example uses of custom ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.2 example 1: byte-matrix transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.3 example 2: mpeg image reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 4.4 example 3: motion-estimation kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 4.4.1 a simple transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 4.4.2 more unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 5 cache architecture 5.1 memory system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.2 dram aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.3 data cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
pnx1300/01/02/11 data book philips semiconductors 6 preliminary specification 5.3.1 general cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.3.2 address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.3.3 miss processing order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.3.4 replacement policies, coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.3.5 alignment, partial-word transfers, endian-ness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.3.6 dual ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.3.7 cache locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.3.8 memory hole and pci aperture disable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 5.3.9 non-cacheable region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 5.3.10 special data cache operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 5.3.10.1 copyback and invalidate operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 5.3.10.2 data cache tag and status operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 5.3.10.3 data cache allocation operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5- 7 5.3.10.4 data cache prefetch operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5- 7 5.3.11 memory operation ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 5.3.12 operation latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.3.13 mmio register references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.3.14 pci bus references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.3.15 cpu stall conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.3.16 data cache initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.4 instruction cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.4.1 general cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.4.2 address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.4.3 miss processing order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.4.4 replacement policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.4.5 location of program code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.4.6 branch units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.4.7 coherency: special iclr operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.4.8 reading tags and cache status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.4.9 cache locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 5.4.10 instruction cache initialization and boot sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10 5.5 lru algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.5.1 two-way algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6 cache coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.1 example 1: data-cache/input-unit coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.2 example 2: data-cache/output-unit coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.3 example 3: instruction-cache/data-c ache coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.4 example 4: instruction-cache/input-un it coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.5 four-way algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.6.6 lru initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
philips semiconductors preliminary specification 7 5.6.7 lru bit definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 5.6.8 lru for the dual-ported cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 5.7 performance evaluation support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12 5.8 mmio register summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 6 video in 6.1 video in overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.1 interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.2 diagnostic mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.3 power down and sleepless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.4 hardware and software reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.2 clock generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 6.3 fullres capture mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 6.4 halfres capture mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.5 raw capture modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.6 message-passing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6.6.1 vi_dvalid in message passing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6- 12 6.7 highway latency and hbe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 7 enhanced video out 7.1 enhanced video out summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.2 about this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.3 backward compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.4 function summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.4.1 detailed feature descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.4.2 summary of operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.5 interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.6 block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.7 clock system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.8 image timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.8.1 ccir 656 pixel timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.8.2 ccir 656 line timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.8.3 sav and eav codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 7.8.4 video clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.8.5 ccir 656 frame timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.9 enhanced video out timing generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.9.1 active video area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.9.2 sav and eav overlap period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 7.9.3 control of frame and image counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 7.9.4 horizontal and frame timing signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 7.10 genlock mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
pnx1300/01/02/11 data book philips semiconductors 8 preliminary specification 7.11 data transfer timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7.12 image data memory formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7.12.1 video image formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7.12.2 planar storage of video image data in memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 7.12.3 graphics overlay image format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 7.13 video image conversion algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 7.13.1 yuv 4:2:2 interspersed to yuv 4:2:2 co-sited conversion . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 7.13.2 yuv 4:2:0 to yuv 4:2:2 co-sited conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 1 7.13.3 yuv-2x upscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 7.13.4 pixel mirroring for four-tap filter s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 7.14 evo operating modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.15 video processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.15.1 alpha blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 7.15.2 chroma keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 7.15.3 programmable clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 7.16 mmio registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 7.16.1 vo status register (vo_status) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16 7.16.2 vo control register (vo_ctl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 7.16.3 vo-related registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18 7.16.4 evo control register (evo_ctl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 7.16.5 evo-related registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.17 enhanced video out operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.17.1 video refresh modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.18 frame and field timing control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.18.1 recommended values for timing registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 -23 7.18.2 data-transfer modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.18.3 interrupts and error conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23 7.18.4 latency and bandwidth requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 -24 7.18.5 power down and sleepless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-24 7.19 dds and pll filter details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 8 audio in 8.1 audio in overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 8.2 external interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 8.3 clock system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.3.1 pnx1300 improved mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.3.2 tm-1000 compatibility mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.4 clock system operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.5 serial data framing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3 8.6 memory data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4 8.7 audio in operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
philips semiconductors preliminary specification 9 8.8 power down and sleepless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 8.9 highway latency and hbe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 8.10 error behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 8.11 diagnostic mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 9 audio out 9.1 audio out overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 9.2 external interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 9.3 summary of operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 9.4 internal clock source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3 9.4.1 pnx1300 standard improved mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3 9.4.2 tm-1000 compatibility mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 9.5 clock system operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 9.6 serial data framing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 9.6.1 serial frame limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 9.6.2 i2s serial framing example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6 9.7 codec control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6 9.8 memory data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7 9.9 audio out operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 9.10 interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 9.11 timestamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 9.12 powerdown and sleepless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 9.13 highway latency and hbe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 9.14 error behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 10 spdif out 10.1 spdif out overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.2 external interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.3 summary of operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.3.1 spdif mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.3.2 transparent dma mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.4 iec-958 serial format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 10.5 iec-958 bit cell and pre-amble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 10.6 iec-958 parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 10.7 iec-958 memory data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 10.8 sample rate programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 10.9 transparent mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 10.10 dma operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 10.11 dma error conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 10.12 interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 10.13 timestamps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
pnx1300/01/02/11 data book philips semiconductors 10 preliminary specification 10.14 mmio register description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-5 10.15 reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 10.16 power down and sleepless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 10.17 hbe and highway latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 10.18 literature references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7 11 pci interface 11.1 pci overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1 11.2 pci interface as an initiator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.2.1 dspcpu single-word loads/stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1-2 11.2.2 i/o operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.2.3 configuration operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.2.4 dma operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.3 pci interface as a target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.4 transaction concurrency, priorities, and ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.5 registers addressed in pci configuration space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.5.1 vendor id register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.5.2 device id register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.5.3 command register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.5.4 status register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5 11.5.5 revision id register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 11.5.6 class code register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 11.5.7 cache line size register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.5.8 latency timer register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.5.9 header type register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.5.10 built-in self test register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.5.11 base address registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7 11.5.12 subsystem id, subsystem vendor id register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.5.13 expansion rom base address register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.5.14 interrupt line register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.5.15 interrupt pin register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.5.16 max_lat, min_gnt registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.6 registers in mmio space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.6.1 dram_base register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.6.2 mmio_base register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 11.6.3 mmio/dram_base updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10 11.6.4 biu_status register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11 11.6.5 biu_ctl register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11 11.6.6 pci_adr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12 11.6.7 pci_data register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12 11.6.8 config_adr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
philips semiconductors preliminary specification 11 11.6.9 config_data register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13 11.6.10 config_ctl register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13 11.6.11 io_adr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13 11.6.12 io_data register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13 11.6.13 io_ctl register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13 11.6.14 src_adr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14 11.6.15 dest_adr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14 11.6.16 dma_ctl register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14 11.6.17 int_ctl register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15 11.7 pci bus protocol overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15 11.7.1 single-data-phase operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-16 11.7.2 multi-data-phase operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-16 11.8 limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 11.8.1 bus locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 11.8.2 no expansion rom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 11.8.3 no cacheline wrap address sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11- 17 11.8.4 no burst for i/o or configuration space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 11.8.5 word-only mmio register access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17 12 sdram memory system 12.1 new in pnx1300/01/02/11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1 12.2 pnx1300 main memory overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1 12.3 main-memory address aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1 12.4 memory devices supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12.4.1 sdram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12.4.2 sgram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12.5 memory granularity and sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12.6 memory system programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 12.6.1 mm_config register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3 12.6.2 pll_ratios register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4 12.7 memory interface pin list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 12.8 address mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 12.8.1 address mapping in 32-bit mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 12.8.2 address mapping in 16-bit mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6 12.9 memory interface and sdram initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6 12.10 on-chip sdram interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6 12.11 refresh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6 12.12 power-down mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.13 output driver capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.14 signal propagation delay compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.15 circuit board design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7
pnx1300/01/02/11 data book philips semiconductors 12 preliminary specification 12.15.1 general guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7 12.15.2 specific guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.15.3 termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.16 timing budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.16.1 main ac parameter requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.17 example block diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.17.1 block diagrams for a 32-bit interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.17.1.1 16-mbit devices or less . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9 12.17.1.2 64-mbit devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10 12.17.1.3 128-mbit devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13 12.17.1.4 256-mbit devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16 12.17.2 block diagrams for a 16-bit interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-17 13 system boot 13.1 boot sequence overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1 13.2 boot hardware operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 13.2.1 boot procedure common to both autonomous a nd host-assisted bootstrap . . . . . . . . . . . . 13-2 13.2.2 initial dspcpu program load for autonomous bootstra p . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5 13.3 host-assisted boot description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.3.1 stage 1: pnx1300 system boot hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.3.2 stage 2: host-system pci configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.3.3 stage 3: pnx1300 driver executing on the host . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 13.4 detailed eeprom contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 13.5 eeprom access protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-9 14 image coprocessor 14.1 image coprocessor overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 14.2 requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 14.2.1 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 14.2.2 bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 14.2.3 image size and scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.3 interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.4 data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.4.1 image input formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.4.1.1 yuv 4:2:2 co-sited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.4.1.2 yuv 4:2:2 interspersed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.4.1.3 yuv 4:2:0 xy interspersed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4-3 14.4.1.4 yuv 4:1:1 co-sited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3 14.4.2 image overlay formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5 14.4.3 alpha blending codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5 14.4.4 output formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5
philips semiconductors preliminary specification 13 14.5 algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6 14.5.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6 14.5.2 filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6 14.5.3 scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6 14.5.4 yuv to rgb conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 14.5.5 overlay and alpha blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 14.5.6 dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10 14.5.7 implementation overview: horizontal scaling and filter ing . . . . . . . . . . . . . . . . . . . . . . . . . . 14-11 14.5.7.1 loading the extra pixels in the filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4-12 14.5.7.2 mirroring pixels at the ends of a line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 -12 14.5.7.3 horizontal filter sdram timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 -12 14.5.8 implementation overview: vertical scaling and filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13 14.5.8.1 mirroring lines at the ends of an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15 14.5.8.2 vertical filter sdram block timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14- 15 14.5.9 horizontal scaling and filtering for rgb output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14- 15 14.5.9.1 yuv sequence counter in yuv 4:2:2 output mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15 14.5.9.2 pci output block timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16 14.6 operation and programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-16 14.6.1 icp register model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17 14.6.2 power down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17 14.6.3 icp operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18 14.6.4 icp microprogram set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18 14.6.5 icp processing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18 14.6.6 priority delay and icp minimum bus bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21 14.6.7 icp parameter tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22 14.6.8 load coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22 14.6.9 horizontal filter - sdram to sdram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22 14.6.9.1 algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22 14.6.9.2 parameter table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22 14.6.9.3 control word format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-23 14.6.10 vertical filter - sdram to sdram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-24 14.6.10.1 algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-24 14.6.10.2 parameter table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-24 14.6.10.3 control word format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-25 14.6.11 horizontal filter with rgb/yuv conversion to pci or sdram . . . . . . . . . . . . . . . . . . . . . . 14-25 14.6.11.1 algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-25 14.6.11.2 parameter table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-26 14.6.11.3 control word format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-27 15 variable length decoder 15.1 vld overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1
pnx1300/01/02/11 data book philips semiconductors 14 preliminary specification 15.2 vld operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1 15.3 decoding up to a slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2 15.4 vld input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2 15.5 vld output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3 15.5.1 macroblock header output data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3 15.5.2 run-level output data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.6 vld time sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.7 mmio registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.7.1 vld status (vld_status) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.7.2 vld interrupt enable (vld _imask) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.7.3 vld control (vld_ctl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.8 vld dma registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.8.1 dma input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.8.2 macroblock header output dma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.8.3 run-level output dma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5 15.9 vld operational registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 15.9.1 vld command (vld_command) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 15.9.2 vld shift register (vld_sr) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 15.9.3 vld quantizer scale (vld_qs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 15.9.4 vld picture info (vld_pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 15.10 error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 15.11 interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 15.12 reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 15.13 endian-ness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 15.14 power down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 15.15 references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8 16 i2c interface 16.1 i2c overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.2 compared to tm-1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.3 external interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.4 i2c register set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.4.1 iic_ar register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.4.2 iic_dr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2 16.4.3 iic_sr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3 16.4.4 iic_cr register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4 16.5 i2c software operation mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-5 16.6 i2c hardware operation mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-5 16.6.1 slave nak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6 16.7 i2c clock rate generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7
philips semiconductors preliminary specification 15 17 synchronous serial interface 17.1 synchronous serial interface overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1 17.2 interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1 17.3 block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1 17.3.1 general purpose i/o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-2 17.3.2 frame synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3 17.3.3 ssi transmit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3 17.3.4 ssi receive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3 17.4 ssi transmit operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5 17.4.1 setup ssi_ctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5 17.4.2 operation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5 17.4.3 interrupt and status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5 17.5 ssi receive operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 17.5.1 setup ssi_ctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 17.5.2 operation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 17.5.3 interrupt and status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 17.6 frame timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 17.7 interrupt generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7 17.8 16-bit endian-ness and shift direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7 17.9 ssi test modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8 17.9.1 remote loopback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8 17.9.2 local loopback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8 17.10 mmio registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8 17.10.1 ssi control register (ssi_ctl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-9 17.10.2 ssi control/status register (ssi_csr) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 7-11 17.11 timing diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-12 17.12 power down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-12 18 jtag functional specification 18.1 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1 18.2 test access port (tap) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1 18.2.1 tap controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1 18.2.2 pnx1300 jtag instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2 18.3 using jtag for pnx1300 debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3 18.3.1 jtag instruction and data registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4 18.3.2 jtag communication protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5 18.3.3 example data transfer via jtag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5 18.3.3.1 transferring data to trimedia via jtag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5 18.3.3.2 transferring data from trimedia via jtag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6 18.3.4 jtag interface module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
pnx1300/01/02/11 data book philips semiconductors 16 preliminary specification 19 on-chip semaphore assist device 19.1 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1 19.2 sem device specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1 19.3 constructing a 12-bit id . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1 19.4 which sem to use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1 19.5 usage notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-1 20 arbiter 20.1 arbiter features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1 20.2 dual priorities with priority raising mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1 20.3 round robin arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2 20.3.1 weighted round robin arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2 20.3.2 arbitration levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3 20.4 arbiter architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-4 20.5 arbiter programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-5 20.5.1 latency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-5 20.5.2 bandwidth analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-6 20.6 extended behavior analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-7 20.6.1 extended bandwidth analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-7 20.6.2 extended latency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-7 20.6.3 raising priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-8 20.6.4 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-8 21 power management 21.1 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1 21.2 entering and exiting global power down mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1 21.3 effect of global power down on peri pherals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-1 21.4 detailed sequence of events for global power down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1-2 21.5 mmio register power_down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2 21.6 block power down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2 22 pci-xio external i/o bus 22.1 summary functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1 22.1.1 description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1 22.2 block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-3 22.3 data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-5 22.4 interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-5 22.4.1 pci-xio bus interface design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-5 22.4.1.1 flash eeprom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-6 22.4.1.2 68k bus i/o device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-6 22.4.1.3 x86/isa bus i/o device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-6
philips semiconductors preliminary specification 17 22.4.1.4 multiple flash eeprom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2-6 22.5 xio_ctl mmio register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7 22.5.1 pci_clk bus clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7 22.5.2 wait state generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-8 22.6 pci-xio bus timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-8 22.7 pci-xio bus controller operation and programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-12 a pnx1300/01/02/11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . dspcpu operations a.1 alphabetic operation list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-1 a.2 operation list by function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-2 alloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-4 allocd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-5 allocr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-6 allocx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-7 asl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-8 asli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-9 asr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-10 asri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-11 bitand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-12 bitandinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-13 bitinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-14 bitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-15 bitxor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-16 borrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-17 carry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-18 curcycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-19 cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-20 dcb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-21 dinvalid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-22 dspiabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-23 dspiadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-24 dspidualabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-25 dspidualadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-26 dspidualmul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-27 dspidualsub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-28 dspimul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-29 dspisub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-30 dspuadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-31 dspumul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-32 dspuquadaddui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-33
pnx1300/01/02/11 data book philips semiconductors 18 preliminary specification dspusub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-34 dualasr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-35 dualiclipi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-36 dualuclipi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-37 fabsval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-38 fabsvalflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-39 fadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-40 faddflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-41 fdiv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-42 fdivflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-43 feql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-44 feqlflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-45 fgeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-46 fgeqflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-47 fgtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-48 fgtrflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-49 fleq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-50 fleqflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-51 fles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-52 flesflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-53 fmul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-54 fmulflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-55 fneq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-56 fneqflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-57 fsign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-58 fsignflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-59 fsqrt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-60 fsqrtflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-61 fsub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-62 fsubflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-63 funshift1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-64 funshift2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-65 funshift3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-66 h_dspiabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-67 h_dspidualabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-68 h_iabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-69 h_st16d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-70 h_st32d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-71 h_st8d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-72 hicycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-73
philips semiconductors preliminary specification 19 iabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-74 iadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-75 iaddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-76 iavgonep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-77 ibytesel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-78 iclipi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-79 iclr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-80 ident . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-81 ieql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-82 ieqli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-83 ifir16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-84 ifir8ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-85 ifir8ui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-86 ifixieee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-87 ifixieeeflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-88 ifixrz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-89 ifixrzflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-90 iflip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-91 ifloat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-92 ifloatflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-93 ifloatrz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-94 ifloatrzflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-95 igeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-96 igeqi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-97 igtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-98 igtri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-99 iimm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-100 ijmpf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-101 ijmpi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-102 ijmpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-103 ild16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-104 ild16d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-105 ild16r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-106 ild16x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-107 ild8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-108 ild8d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-109 ild8r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-110 ileq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-111 ileqi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-112 iles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-113
pnx1300/01/02/11 data book philips semiconductors 20 preliminary specification ilesi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-114 imax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-115 imin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-116 imul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-117 imulm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-118 ineg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-119 ineq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-120 ineqi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-121 inonzero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-122 isub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-123 isubi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-124 izero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-125 jmpf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-126 jmpi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-127 jmpt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-128 ld32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-129 ld32d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-130 ld32r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-131 ld32x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-132 lsl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-133 lsli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-134 lsr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-135 lsri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-136 mergedual16lsb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-137 mergelsb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-138 mergemsb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-139 nop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-140 pack16lsb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-141 pack16msb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-142 packbytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-143 pref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-144 pref16x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-145 pref32x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-146 prefd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-147 prefr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-148 quadavg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-149 quadumax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-150 quadumin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-151 quadumulmsb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-152 rdstatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-153
philips semiconductors preliminary specification 21 rdtag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-154 readdpc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-155 readpcsw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-156 readspc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-157 rol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-158 roli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-159 sex16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-160 sex8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-161 st16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-162 st16d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-163 st32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-164 st32d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-165 st8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-166 st8d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-167 ubytesel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-168 uclipi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-169 uclipu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-170 ueql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-171 ueqli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-172 ufir16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-173 ufir8uu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-174 ufixieee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-175 ufixieeeflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-176 ufixrz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-177 ufixrzflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-178 ufloat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-179 ufloatflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-180 ufloatrz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-181 ufloatrzflags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-182 ugeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-183 ugeqi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-184 ugtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-185 ugtri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-186 uimm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-187 uld16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-188 uld16d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-189 uld16r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-190 uld16x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-191 uld8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-192 uld8d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-193
pnx1300/01/02/11 data book philips semiconductors 22 preliminary specification uld8r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-194 uleq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-195 uleqi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-196 ules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-197 ulesi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-198 ume8ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-199 ume8uu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-200 umin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-201 umul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-202 umulm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-203 uneq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-204 uneqi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-205 writedpc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-206 writepcsw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-207 writespc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-208 zex16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-209 zex8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-210 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-212 b mmio register summary b.1 mmio registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . b-1 c endian-ness c.1 purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-1 c.2 little and big endian addressing conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-1 c.3 test to verify the correct operation of pnx1300 in bi g and little endian systems . . . . . . . . . . . . . . c-2 c.4 requirement for the pnx1300 to operate in either litt le endian or big endian mode . . . . . . . . . . . . c-2 c.4.1 data cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-2 c.4.2 instruction cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-3 c.4.3 pnx1300 pci interface unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-3 c.4.4 image coprocessor (icp) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-3 c.4.5 video in (vi) and video out (vo) units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-7 c.4.6 audio in (ai), audio-out (ao), and spdif out (sdo) units . . . . . . . . . . . . . . . . . . . . . . . . . . c-7 c.4.7 variable length encoder (vld) unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-7 c.4.8 synchronous serial interface (ssi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-8 c.4.9 compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-9 c.5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-9 c.6 references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c-9 index
preliminary specification 1-1 pin list chapter 1 by john chang, wenyi song, thorwald rabe ler, luis lucas 1.1 pnx1300 series versus tm-1300 the following summarizes differences be tween tm-1300 and pnx1300/01/02/11: ? lower core voltage for pnx1311 (2.2v core voltage) and therefore lower power consumption. ? dspcpu speed of up to 200 mhz. ? sdram speed of up to 183 mhz. ? support for 256 mbit sdram organized in x16. the refresh counter must be changed. refer for in chapter 12, ?sdram memory system? for details. ? support for 16- and 32-bit main memory interface. ? simplified power supplies sequencing (see section 1.9.4 ). ? additional mode where vi_data[9:8] in message passi ng mode are not affected by the vi_dvalid signal. ? bug fixed for pci special cycles. pnx1300 series discar ds pci special cycles issued by some pci chipsets. ? autonomous boot bug in non 1:1 rati o is fixed, resulting in 2kb boot eepr om size for all cpu:sdram ratios. in the document, ?pnx1300 series? is used interchangeb ly with ?pnx1300/01/02/11?, and it always refers to pnx1300, pnx1301, pnx1302 and pnx1311 products. any exception will be noted. 1.2 boundary scan notice pnx1300 series implements full ieee 1149.1 boundary scan . any pnx1300 series pin designated ?in? only (from a functionality point of view) can beco me an output during boundary scan. 1.3 i/o circuit summary pnx1300 series has a total of 169 f unctional pins, excluding vddq, vssq, vref_pci and vref_periph and digital power/ground. pnx1300 series uses the types of i/o circuits shown in the table below. for the pins with 5-v input capability, the special pins vr ef_pci or vref_periph determine 3.3- or 5-v input toler- ance, as per the table in section 1.6 . the above pad types are used in th e modes listed in the following table. unused pins may remain fl oating, i.e. unconnected. all pins that drive a clock should drive a series resistor. pad type pad type description pci pci2.1 compliant i/o, capable of usi ng 3.3-v or 5-v pci signaling conventions. pciod pci2.1 compliant open drain i/o, capable of using 3.3-v or 5-v pci signaling conventions. iicod open drain 3.3-v or 5-v i 2 c i/o (for i 2 c pins). strg3 3.3-v only low impedance i/o. requires board level 27-33 ohm series terminator resistor to match 50 ohm pcb trace. norm3 3.3-v only i/o circuit wi th regular drive strength and board trace matched drive impedance. strg5 3.3-v low impedance output, combined with 5-v tolerant input. if used as output, it requires a board level 27-33 ohm series terminator resi stor to match 50-ohm pcb trace. weak5 3.3-v regular impedance output, with slow rise/fall, combin ed with 5-v tolerant input. modes description in input only, except during boundary scan out output only, except during boundary scan od open drain output - active pull low, no ac tive drive high, requi res external pull-up i/o output or input i/od open drain output with input - active pull low, no active driv e high, requires external pull-up
pnx1300/01/02/11 data book philips semiconductors 1-2 preliminary specification 1.4 signal pin list in the table below, a pin name ending in a ?#? designates an ac tive-low signal (the active st ate of the signal is a low voltage level). all other signals have active-high polarity. pin name bga ball pad type mode description main clock interface tri_clkin l20 norm3 in main input clock. the sdram clock outputs (mm_clk0 and mm_clk1) can be set to 2x or 3x this frequency. the on-chip dspcpu clock (dspcpu_clk) can be set to 1x, 5/4, 4/3, 3/2 or 2x the sdram clock frequency. maximum recommended ppm level is +/- 100 ppm or lower to improve jitter on generated clocks. duty cycle should not exceed 30/70% asymmetry. the operating limits of the internal plls are: ? 27 mhz < output of the sdram pll < 200 mhz ? 33 mhz < output of the cpu pll < 266 mhz these are not the speed grades of t he chips, just the pll limits. vddq k20 n/a pwr quiet vdd for the pll subsystem. this pin should be supplied from vdd through a low-q series inductor. it should be bypass ed for ac to vssq, using a dual capacitor bypass (hi and low frequency ac bypass). vssq l19 n/a gnd quiet vss for the pll subsystem. should be ac bypassed to vddq, but should otherwise be left dc floating. it is conn ected on-chip to vss. no external coil or other connection to board ground is needed, such connection would create a ground loop . miscellaneous system interface tri_reset# g19 weak5 in pnx1300/01/02/11 reset input. th is pin can be tied to the pci rst# signal in pci bus systems. upon releasing reset, pnx1300/ 01/02/11 initiates its boot protocol. boot_clk t20 norm3 in used for testing purposes. must be connected to tri_clkin for normal operation. testmode p19 norm3 in used for testing purposes. must be connected to vss for normal operation. scancpu d20 norm3 in used for testing purposes. mu st be connected to vss for normal operation. reserved1 e19 norm3 i/o reserved pin. has to be left unconnected for normal operation. reserved2 d19 strg5 i/o reserved pin. has to be left unconnected for normal operation. vref_pci f2 n/a pwr vref_pci determines the m ode of operation of the pci pins listed in section 1.6 . vref_pci must be connected to 5v for use in a 5-v pci signaling environment or to vss (0 v) for use in 3.3-v pci signaling environment. the supply to this pin should be ac bypassed and provide 40 ma of dc sink or source capability. note that this pin can not be directly connected to the pci ?i/o designated power pins? in a dual voltage pci plug-in card. board level conversion circuitry is required. vref_periph c18 n/a pwr vref_periph determines the m ode of operation of the i/o pins listed in section 1.6 . vref_periph should be connected to 5v if any of the listed i/o pins provided should be 5-v input voltage capable. vref_periph should be connected to vss (0-v) if all listed i/o pins are 3.3-v on ly inputs. the supply to this pin should be ac bypassed and provide 40 ma of dc sink or source capability. tri_userirq g20 weak5 in general purpose level/edge interr upt input. vectored inte rrupt source number 4. tri_timer_clk h19 weak5 in external general purpo se clock source for timers. max. 40 mhz.
philips semiconductors pin list preliminary specification 1-3 main memory interface mm_clk0 mm_clk1 y10 w10 strg3 out sdram output clock at 2x or 3x tri_ clkin frequency. two identical outputs are pro- vided to reliably drive several small memo ry configurations wi thout external glue. a series terminating resistor close to pnx1300/01/02/11 is requi red to reduce ringing. for driving a 50-ohm trace, a resistor of 27 to 33 ohm is recommended. it is recom- mended against using higher impedance traces in the sdram signals. mm_a00 mm_a01 mm_a02 mm_a03 mm_a04 mm_a05 mm_a06 mm_a07 mm_a08 mm_a09 mm_a10 mm_a11 mm_a12 mm_a13 w12 y12 w11 y11 y9 w9 v9 y8 w8 y7 v12 y13 w13 y14 norm3 out main memory address bus; used for row and column addresses warning: mm_a[13:11] do not connect directly to sdram a[13:11] pins. refer to chapter 12, ?sdram memory system? for accurate connection diagrams. mm_dq00 mm_dq01 mm_dq02 mm_dq03 mm_dq04 mm_dq05 mm_dq06 mm_dq07 mm_dq08 mm_dq09 mm_dq10 mm_dq11 mm_dq12 mm_dq13 mm_dq14 mm_dq15 mm_dq16 mm_dq17 mm_dq18 mm_dq19 mm_dq20 mm_dq21 mm_dq22 mm_dq23 mm_dq24 mm_dq25 mm_dq26 mm_dq27 mm_dq28 mm_dq29 mm_dq30 mm_dq31 y20 v18 w19 w20 u18 v19 v20 t18 w18 v17 y18 w17 y17 w16 y16 v15 w7 y6 w6 v6 y5 w5 y4 w4 v2 v3 w1 w2 y1 y2 w3 y3 norm3 i/o 32-bit data i/o bus. the main memory interface unit also s upports a 16-bit i/o interface. refer to chapter 12, ?sdram memory system.? mm_cke0 mm_cke1 y19 u1 norm3 out clock enable output to sdrams. two identic al outputs are provided in order to reli- ably drive several small memory conf igurations without external glue. mm_cs0# mm_cs1# mm_cs2# mm_cs3# u2 u20 u3 u19 norm3 out chip select for dram rank n; active low in pnx1300/01/02/11 the chip selects pins may be used as address pins to support the 256 mbit sdram device organized in x16. refer to chapter 12, ?sdram memory system.? mm_ras# w14 norm3 out row address strobe; active low mm_cas# y15 norm3 out column address strobe; active low mm_we# w15 norm3 out write enable; active low pin name bga ball pad type mode description
pnx1300/01/02/11 data book philips semiconductors 1-4 preliminary specification mm_dqm0 mm_dqm1 mm_dqm2 mm_dqm3 t19 r18 v1 v4 norm3 out mm_dq mask enable; these are by te enable signals for the 32-bit mm_dq bus pci interface (note: current buffer design allows drive/receive from either 3.3 or 5v pci bus) pci_clk t2 pci in all pci input signals are sampled with respect to the rising edge of this clock. all pci outputs are generated based on this clock. cl ock is required for normal operation of the pci block. pci_ad00 pci_ad01 pci_ad02 pci_ad03 pci_ad04 pci_ad05 pci_ad06 pci_ad07 pci_ad08 pci_ad09 pci_ad10 pci_ad11 pci_ad12 pci_ad13 pci_ad14 pci_ad15 pci_ad16 pci_ad17 pci_ad18 pci_ad19 pci_ad20 pci_ad21 pci_ad22 pci_ad23 pci_ad24 pci_ad25 pci_ad26 pci_ad27 pci_ad28 pci_ad29 pci_ad30 pci_ad31 t1 r3 r2 r1 p2 p1 n2 n1 m2 m1 l2 l1 k1 k2 j1 j2 d1 d3 c1 b2 b1 c2 c3 a1 a3 c4 b4 a4 a5 c6 b6 a6 pci i/o multiplexed address and data. pci_c/be#0 pci_c/be#1 pci_c/be#2 pci_c/be#3 m3 j3 d2 b3 pci i/o multiplexed bus commands and byte enables. high for command, low for byte enable. pci_par h1 pci i/o even parity across ad and c/be lines. pci_frame# e2 pci i/o sustained tri-state. frame is driv en by a master to indica te the beginning and duration of an access. pci_irdy# e1 pci i/o sustained tri-state. initiator ready i ndicates that the bus master is ready to complete the current data phase. pci_trdy# f3 pci i/o sustained tri-state. target ready indicates that the bus target is ready to complete the current data phase. pci_stop# g2 pci i/o sustained tri-state. indicates that the target is r equesting that the ma ster stop the cur- rent transaction. pci_idsel a2 pci in used as chip select dur ing configuration read/write cycles. pci_devsel# f1 pci i/o sustained tri-state. indicate s whether any device on the bus has been selected. pci_req# b7 pci i/o driven by pnx1300/01/02/11 as pci bus ma ster to request use of the pci bus. pci_gnt# b5 pci in indicates to pnx1300/01/02/11 that access to the bus has been granted. pci_perr# g1 pci i/o sustained tri-state. parity error generated/received by pnx1300/01/02/11. pci_serr# h2 pci od system error. this signal is asserted when operating as target and detecting an address parity error. pin name bga ball pad type mode description
philips semiconductors pin list preliminary specification 1-5 pci_inta# pci_intb# pci_intc# pci_intd# c9 a8 b8 a7 pciod pci pciod pciod i/od i/o/od i/od i/od ? can operate as input (power up default) or output, as determined by direction con- trol bits in pci mmi o register int_ctl. ? as input, a pci_int# pin can be used to re ceive pci interrupt requests (normal pci use is active low, level sensitive mode, but the vic can be set to treat these as pos- itive edge triggered mode). as input, a pci_i nt# pin can also be used as a general interrupt request pin if not needed for pci. ? as output, the value of a pci_int# c an be programmed through pci mmio regis- ters to generate interrupt s for other pci masters. ? whenever xio bus functionality is active, pci_intb# is a push-pull cmos i/o pin. when the xio bus is not acti ve and regular pci bus functi onality is activated, then pci_intb# has a pci compatible open drain output. jtag interface (debug access port and 1149.1 boundary scan port) jtag_tdi f20 weak5 in jtag test data input jtag_tdo f18 weak5 i/o jtag test data output. this pin can either drive active low, high or float. jtag_tck f19 weak5 in jtag test clock input jtag_tms e20 weak5 in jtag test mode select input video in vi_clk c20 strg5 i/o ? if configured as input (power up default): a positive transit ion on this incoming video clock pin samples all other vi_data input signals below if vi_dvalid is high. if vi_dvalid is low, vi_data is ignored. clock and data rates of up to 81 mhz are supported. pnx1300 series supports an addi tional mode where vi_data[9:8] in message passing mode are not affected by the vi_dvalid signal, section 6.6.1 on page 6-12 . ? if configured as output: programmable outpu t clock to drive an external video a/d converter. can be programmed to em it integral dividers of dspcpu_clk. if used as output, a board level 27-33 ohm series resistor is recommended to reduce ringing. vi_dvalid a17 weak5 in vi_dvalid indi cates that valid data is present on the vi_data lines. if high, vi_data will be accepted on the next vi_clk positive edge. if low, no vi_data will be sampled. pnx1300 series supports an additional mode where vi_data[9:8] in message passing mode are not affe cted by the vi _dvalid signal, section 6.6.1 on page 6-12 . vi_data0 vi_data1 vi_data2 vi_data3 vi_data4 vi_data5 vi_data6 vi_data7 d18 c19 b20 b19 a20 a19 c17 b18 weak5 in ccir656 style yuv 4:2:2 data from a digital camera, or general purpose high speed data input pins. sampled on vi_clk if vi_dvalid high. vi_data8 vi_data9 a18 b17 weak5 in extension high speed data input bits to allow use of 10 bit video a/d converters in raw10 modes. vi_data[8] serves as start and vi_data[9] as end message input in message passing mode. sampled on positive transitions of vi_clk if vi_dvalid high. pnx1300 series supports an additional mode where vi_data[9:8] in message passing mode are not affected by the vi_dvalid signal, section 6.6.1 on page 6-12 . i 2 c interface iic_sda r19 iicod i/od i 2 c serial data iic_scl r20 iicod i/od i 2 c clock video out vo_data0 vo_data1 vo_data2 vo_data3 vo_data4 vo_data5 vo_data6 vo_data7 p20 n19 n20 m18 m19 m20 k19 j20 weak5 out ccir656 style yuv 4:2:2 digital output data, or general purpose high speed data out- put channel. output changes on positive edge of vo_clk. pin name bga ball pad type mode description
pnx1300/01/02/11 data book philips semiconductors 1-6 preliminary specification vo_io1 j18 weak5 i/o this pin can function as hs output or as stmsg (s tart message) output. ? if set as hs output, it outpu ts the horizontal sync signal ? in message passing mode, this pin acts as stmsg output. vo_io2 h20 weak5 i/o this pin can function as fs (fr ame sync) input, fs output or as endmsg output. ? if set as fs input, it can be set to respond to positive or negative edge transitions. ? if the video out (vo) unit operates in ex ternal sync mode and the selected transition occurs, the vo unit sends two fields of vi deo data. note: this works only once after a reset. ? in message passing mode, this pin acts as endmsg output. vo_clk j19 strg5 i/o the vo unit emits vo_data on a positive edge of vo_clk. vo_clk can be config- ured as input (reset default) or output. ? if configured as input: vo_clk is receiv ed from external display clock master cir- cuitry. ? if configured as output, pnx1300/01/02/11 emits a programmable clock frequency. the emitted frequency can be set between appr ox. 4 and 81 mhz with a sub-hertz resolution. the clock generated is frequency accurate and has low jitter properties due to a combination of an on-chip dds (d irect digital synthesizer) and vco/pll. if used as output, a board level 27-33 ohm series resistor is recommended to reduce ringing. audio in (always acts as receiver, but can be master or slave for a/d timing) ai_osclk b15 strg3 out over-sampling clock. this output can be programmed to emit any frequency up to 40 mhz with a sub-hertz resolution. it is intended for use as the 256f s or 384f s over sam- pling clock by external a/d subsystem. a boar d level 27-33 ohm series resistor is rec- ommended to reduce ringing. ai_sck a16 strg5 i/o ? when the audio in (ai) unit is programmed as a serial -interface timing slave (power-up default), ai_sck is an input. ai_sck receives the serial bit clock from the external a/d subsystem. this clock is treated as fully asynchronous to the pnx1300/01/02/11 main clock. ? when the ai unit is programmed as the se rial-interface timing master, ai_sck is an output. ai_sck drives the serial clock for the external a/d subsystem. the fre- quency is a programmable integral divisors of the ai_osclk frequency. ai_sck is limited to 22 mhz. the sample rate of valid samples embedded within the serial stream is variable. if used as output, a board level 27-33 ohm series resistor is recommended to reduce ringing. ai_sd c15 weak5 in serial data from external a/d subsystem. data on this pin is sampled on positive or negative edges of ai_sck as determined by the clock_edge bit in the ai_serial register. ai_ws b16 weak5 i/o ? when the ai unit is programmed as the serial-interface timing slave (power-up default), ai_ws acts as an input. ai_ws is sampled on the same edge as selected for ai_sd. ? when audio in is programmed as the serial -interface timing master, ai_ws acts as an output. it is asserted on the oppos ite edge of the ai_sd sampling edge. ai_ws is the word-select or frame-synchr onization signal from/t o the external a/d subsystem. pin name bga ball pad type mode description
philips semiconductors pin list preliminary specification 1-7 audio out (always acts as sender, but can be master or slave for d/a timing) ao_osclk b14 strg3 out over sampling clock. this out put can be programmed to emit any frequency up to 40 mhz, with a sub-hertz resolution. it is intended for use as the 256 or 384f s over sam- pling clock by the external d/a conversi on subsystem. a board level 27-33 ohm series resistor is recommended to reduce ringing. ao_sck a14 strg5 i/o ? when the audio out (ao) unit is programmed to act as the serial interface timing slave (power up default), ao_sck acts as i nput. it receives the serial clock from the external audio d/a subsystem. the clock is treated as fully asynchronous to the pnx1300/01/02/11 main clock. ? when the ao unit is programmed to act as serial interface timing master, ao_sck acts as output. it drives the serial clo ck for the external audio d/a subsystem. the clock frequency is a program mable integral divisor of the ao_osclk frequency. ao_sck is limited to 22 mhz. the sample rate of valid samples embedded within the serial stream is variable. if used as output , a board level 27-33 ohm series resistor is recommended to reduce ringing. ao_sd1 b13 weak5 out serial data to external stereo audio d/a subsystem for first 2 of 8 channels. the timing of transitions on this output is determi ned by the clock_edge bit in the ao_serial register, and can be on positive or negative ao_sck edges. ao_sd2 a13 weak5 out serial data. ao_sd3 c12 weak5 out serial data. ao_sd4 b12 weak5 out serial data. ao_ws a15 weak5 i/o ? when the ao unit is programmed as the serial-interface timing slave (power-up default), ao_ws acts as an input. ao _ws is sampled on the opposite ao_sck edge at which ao_sdx are asserted. ? when the ao unit is programmed as serial -interface timing master, ao_ws acts as an output. ao_ws is asserted on the same ao_sck edge as ao_sdx. ao_ws is the word-select or frame-synchr onization signal from/t o the external d/a subsystem. each audio channel receives 1 sample for every ws period. s/pdif output (output) spdo a12 strg3 out self clocking serial data stream as per iec958, with 1937 extensions. note that the low impedance output buffer requires a 27 to 33 ohm series terminator close to pnx1300/01/02/11 in order to match the boar d trace impedance. this series termina- tor can be/must be part of the voltage divider needed to create the coaxial output through the ac isolation transformer. synchronous serial interface (ssi) to an off-chip modem front-end ssi_clk b11 weak5 in clock signal of the synchronous se rial interface to an off-chip modem analog frontend or isdn terminal adapter; provided by the receive channel of an external communica- tion device. ssi_rxfsx a11 weak5 in receive frame sync reference of the synchronous serial interface, provided by the receive channel of an external communication device. ssi_rxdata a10 weak5 in receive serial data input; provided by the receive channel of an external communica- tion device. ssi_txdata b10 weak5 out transmit serial data output; sent to the transmit channel of the external communica- tion device. ssi_io1 a9 weak5 i/o general purpose programm able i/o. set to input on power up. ssi_io2 b9 weak5 i/o general purpose pr ogrammable i/o. set to input on power up. can also be pro- grammed to function as the transmit chan nel frame synchronization reference output. pin name bga ball pad type mode description
pnx1300/01/02/11 data book philips semiconductors 1-8 preliminary specification 1.5 power pin list vss (ground) vcc (3.3v i/o supply) vdd (2.5v core supply) c5 c16 d4 d5 d16 d17 e3 e4 e17 e18 t3 t4 t17 u4 u5 u16 u17 v5 v16 h8 h9 h10 h11 h12 h13 j8 j9 j10 j11 j12 j13 k8 k9 k10 k11 k12 k13 l8 l9 l10 l11 l12 l13 m8 m9 m10 m11 m12 m13 n8 n9 n10 n11 n12 n13 c7 c10 c11 c14 d6 d7 d10 d11 d14 d15 f4 f17 g3 g4 g17 g18 k3 k4 k17 k18 l3 l4 l17 l18 p3 p4 p17 p18 r4 r17 u6 u7 u10 u11 u14 u15 v7 v10 v11 v14 c8 c13 d8 d9 d12 d13 h3 h4 h17 h18 j4 j17 m4 m17 n3 n4 n17 n18 u8 u9 u12 u13 v8 v13
philips semiconductors pin list preliminary specification 1-9 1.6 pin reference voltage with the exception of open drain mode outputs, outputs al ways drive to a level determined by the 3.3-v i/o voltage. vref_periph and vref_pci purely determine input voltage clamping, not input signal thresholds or output levels. vref_pci determined mode vref_periph determined mode sdram i/f (always 3.3-volt mode) pci_ad00 pci_ad01 pci_ad02 pci_ad03 pci_ad04 pci_ad05 pci_ad06 pci_ad07 pci_ad08 pci_ad09 pci_ad10 pci_ad11 pci_ad12 pci_ad13 pci_ad14 pci_ad15 pci_ad16 pci_ad17 pci_ad18 pci_ad19 pci_ad20 pci_ad21 pci_ad22 pci_ad23 pci_ad24 pci_ad25 pci_ad26 pci_ad27 pci_ad28 pci_ad29 pci_ad30 pci_ad31 pci_clk pci_c/be#0 pci_c/be#1 pci_c/be#2 pci_c/be#3 pci_par pci_frame# pci_irdy# pci_trdy# pci_stop# pci_idsel pci_devsel# pci_req# pci_gnt# pci_perr# pci_serr# pci_inta# pci_intb# pci_intc# pci_intd# tri_reset# tri_userirq tri_timer_clk jtag_tdi jtag_tdo jtag_tck jtag_tms vi_clk vi_dvalid vi_data0 vi_data1 vi_data2 vi_data3 vi_data4 vi_data5 vi_data6 vi_data7 vi_data8 vi_data9 iic_sda iic_scl vo_io1 vo_io2 vo_clk ai_sck ai_sd ai_ws ao_sck ao_ws ssi_clk ssi_rxfsx ssi_rxdata ssi_io1 ssi_io2 reserved2 mm_clk0 mm_clk1 mm_a00 mm_a01 mm_a02 mm_a03 mm_a04 mm_a05 mm_a06 mm_a07 mm_a08 mm_a09 mm_a10 mm_a11 mm_a12 mm_a13 mm_dq00 mm_dq01 mm_dq02 mm_dq03 mm_dq04 mm_dq05 mm_dq06 mm_dq07 mm_dq08 mm_dq09 mm_dq10 mm_dq11 mm_dq12 mm_dqm0 mm_dqm1 mm_dqm2 mm_dqm3 mm_dq13 mm_dq14 mm_dq15 mm_dq16 mm_dq17 mm_dq18 mm_dq19 mm_dq20 mm_dq21 mm_dq22 mm_dq23 mm_dq24 mm_dq25 mm_dq26 mm_dq27 mm_dq28 mm_dq29 mm_dq30 mm_dq31 mm_cke0 mm_cke1 mm_cs0# mm_cs1# mm_cs2# mm_cs3# mm_ras# mm_cas# mm_we# inputs always in 3.3-v mode output only pins tri_clkin boot_clk testmode scancpu reserved1 vo_data0 vo_data1 vo_data2 vo_data3 vo_data4 vo_data5 vo_data6 vo_data7 ai_osclk ao_osclk ao_sd1 ao_sd2 ao_sd3 ao_sd4 ssi_txdata spdo
pnx1300/01/02/11 data book philips semiconductors 1-10 preliminary specification 1.7 package 1.8 ordering information 1.8.1 lead parts: last time buy for these parts is september 30, 2005: to order 143-mhz/2.5v product, part number is ?p nx1300eh?, 12 nc product code 9352 7097 6557. end of life 09/30/08. to order 180-mhz/2.5v product, part number is ?pnx1301eh?, 12 nc product code 9352 7097 9557. end of life 09/30/08 . to order 200-mhz/2.5v product, part number is ?pnx1302eh?, 12 nc product code 9352 7098 2557. end of life 09/30/08 . to order 166-mhz/2.2v product, part number is ?pnx1311eh?, 12 nc product code 9352 7098 5557. end of life 09/30/08 . 1.27 24.13 a a 1 e 1 b a 2 a 2 a 1 unit d y ek mm 0.70 0.50 2.51 27.2 26.8 d 1 e 1 24.1 23.9 27.2 26.8 24.1 23.9 4.2 3.8 ? j 21.0 15.4 1.83 1.63 y 1 0.90 0.60 0.2 0.15 0.25 dimensions (mm are the original dimensions) ew 0.2 v 0 10 20 mm scale sot553- 1 h bga292: plastic, heatsink ball grid array package; 292 balls; body 27 x 27 x 1.75 mm a max. detail x y y 1 c e e e 1 e 1 ? w b x k k e 1 ? j d d 1 e c m a b c d e f h k g j l m n p r t u v w y 2468101214161820 135791113151719 b a ball a1 index area m va m vb
philips semiconductors pin list preliminary specification 1-11 1.8.2 lead-free parts: available for ordering starting october 1, 2004: to order 143-mhz/2.5v product, part number is ?p nx1300eh/g?, 12 nc product code 9352 7771 6557. to order 180-mhz/2.5v product, part number is ?p nx1301eh/g?, 12 nc product code 9352 7771 7557. to order 200-mhz/2.5v product, part number is ?p nx1302eh/g?, 12 nc product code 9352 7771 8557. to order 166-mhz/2.2v product, part number is ?p nx1311eh/g?, 12 nc product code 9352 7772 1557.
pnx1300/01/02/11 data book philips semiconductors 1-12 preliminary specification 1.9 parametric characteristics 1.9.1 pnx1300/01/02/11 absolute maximum ratings permanent damage may occur if these conditions are exceeded notes: 1. vx in the 5v mode pin is either vref_pci or vref_periph, see section 1.6 . 2. jedec standard, june 2000 3. jedec standard, october 1997 1.9.2 pnx1300/01/02 operating range and thermal characteristics functional operatio n, long-term reliability and ac/dc characteristics are guaranteed for the operating conditions below. 1.9.3 pnx1311 operating range and thermal characteristics functional operatio n, long-term reliability and ac/dc characteristics are guaranteed for the operating conditions below. 1.9.4 pnx1300/01/02/11 power supply sequencing power application and power removal should obey the following rule: v dd should never exceed v cc by more than 0.5 v permanent damage may occur if this rule is not observed. similarly, if the device is operated in 5v input tole rant mode, the 5v power supply must be present be first: v dd and v cc should never exceed by more than 0 v the 5v reference voltage (vref_periph and vref_pci) permanent damage may occur if this rule is not observed. symbol parameter min. max units notes v ddmax 2.5-v core supply voltage (pnx1300/01/02/11) -0.5 3.5 v v ccmax 3.3-v i/o supply voltage -0.5 4.6 v v i-5v dc input voltage on all 5-v pins -0.5 vx+0.5 v 1 v i-3.3v dc input voltage on all 3.3-v pins -0.5 vcc+0.3 v t stg storage temperature range -65 150 deg. c t casemax maximum case temperature range 0 120 deg. c hbm esd human body model electrostati c handling for all pins - - class 1c 2 mm esd machine model electrostatic handling for all pins - - class a 3 symbol parameter minimum typica l maximum units v dd pnx1300/01/02 core supply voltage 2.375 2.50 2.625 v v cc i/o supply voltage 3.135 3.30 3.465 v t case operating case temperature range 0 85 c jt junction to case thermal resistance 3.8 c/w ? ja junction to ambient thermal resi stance (natural convection) 15 c/w symbol parameter minimum typica l maximum units v dd pnx1311 core supply voltage 2.090 2.20 2.310 v v cc i/o supply voltage 3.135 3.30 3.465 v t case operating case temperature range 0 85 c jt junction to case thermal resistance 3.8 c/w ? ja junction to ambient thermal resi stance (natural convection) 15 c/w
philips semiconductors pin list preliminary specification 1-13 1.9.5 pnx1300/01/02 dc/ac characteristics notes: 1. vx for a 5v mode pin is either vref_pci or vref_periph, see section 1.6 . 1.9.6 pnx1311 dc/ac characteristics notes: 1. vx for a 5v mode pin is either vref_pci or vref_periph, see section 1.6 . symbol parameter condition/notes min. max units v dd core supply voltage 2.375 2.625 v v cc i/o supply voltage 3.135 3.465 v i dd-typ core supply current 200 mhz cpu operation (max. application) 1400 ma i cc-typ i/o supply current 183 mhz sdram operation (max. application) 160 ma i dd-pdn core supply current cpu power down mode; 200 mhz 300 ma i cc-pdn i/o supply current cpu power down mode; 183 mhz 50 ma v ih-5v input high voltage for i/o-5 v note 1. all i/o?s except iicod 2.0 vx+ 0.5 v v ih-3.3v input high voltage for i/o-3.3 v all i/os except iicod 2.0 v cc + 0.3 v v il-5v input low voltage for i/o-5 v all i/os except iicod -0.5 0.8 v v il-3.3v input low voltage for i/o-3.3 v all i/os except iicod -0.3 0.8 v i il-5v input leakage current for i/o-5 v 0 < v in < 2.7v -70 70 ua i il--3.3v input leakage current for i/o-3.3 v 0 < v in < 2.7v -0 10 ua c in input pin capacitance 8pf symbol parameter condition/notes min. max units v dd core supply voltage 2.090 2.310 v v cc i/o supply voltage 3.135 3.465 v i dd-typ core supply current 166 mhz cpu operation (max. application) 1110 ma i cc-typ i/o supply current 166 mhz sdram operation (max. application) 145 ma i dd-pdn core supply current cpu power down mode; 166 mhz 215 ma i cc-pdn i/o supply current cpu power down mode; 166 mhz 46 ma v ih-5v input high voltage for i/o-5 v note 1. all i/o?s except iicod 2.0 vx+ 0.5 v v ih-3.3v input high voltage for i/o-3.3 v all i/os except iicod 2.0 v cc + 0.3 v v il-5v input low voltage for i/o-5 v all i/os except iicod -0.5 0.8 v v il-3.3v input low voltage for i/o-3.3 v all i/os except iicod -0.3 0.8 v i il-5v input leakage current for i/o-5 v 0 < v in < 2.7v -70 70 ua i il--3.3v input leakage current for i/o-3.3 v 0 < v in < 2.7v -0 10 ua c in input pin capacitance 8pf
pnx1300/01/02/11 data book philips semiconductors 1-14 preliminary specification 1.9.7 pnx1300 series power consumption the power consumption of pnx1300 series is depen- dent on the activity of t he dspcpu, the amount of pe- ripherals being used, the frequency at which the system is running as well as the loads on the pins. the first section presents the power consumption for known applications. the other power related sections present the maximum power consumption. these maxi- mum values are obtained wit h a ?fake? application that turns on all the peripherals and runs intensive compute on the cpu. 1.9.7.1 power consumption for applications on pnx1300 series the table 1-1 and table 1-2 present the power con- sumption for two typical applications: ? the dvd playback includes video display using the vo peripheral and audio streaming using ao periph- eral. the bitstream is brought into the tm-1300 sys- tem over the pci peripheral. the vld co-processor is used to perform the bitstream parsing. the bit- stream is not scrambled therefore the dvdd co-pro- cessor is not used and it is turned off. ? the mpeg4 application includes video and audio playback of an enocded cif stream. the bit stream is brought into the pnx1300 system over the pci peripheral. the video and audio subsystems of the pnx1300 were used to render the video and sound from the decoded stream into the video monitor and speakers. ? the h263 video conferen cing application includes the following steps. it captures a ccir656 video stream at 30 frames/second using the vi peripheral. the incoming video stream is downscaled, on the fly, to sif resolution by vi. the captured frames are then downscaled to a qsif resolution using the icp co- processor. the resulting qsif image is sent over the pci bus via the icp co-processor to a svga card (pc monitor display) and encoded by the dspcpu. the resulting bitstream is then decoded by the dspcpu and displayed as a sif image on the same pc monitor (also using the icp co-processor). all the encoding/decoding part is done in the yuv color space. the display is in the rgb16 color space. software is not optimized. three main technics may be applied to reduce the ?out of the box? power consumption. ? turn off the unused peripherals. refer to section 21.6 on page 21-2 . ? run the system at the re quired speed, i.e. some application may not require to run at the full speed grade of the chip. ? powerdown the system or the dspcpu each time the dspcpu reached the idle task. a more detailed description can be found in the applica- tion note ?tm-1300 power saving features? available at the following website: http://www.semiconductors.philips.com/trimedia/ as previously mentioned the table 1-1 and table 1-2 show that the final power consumption for a realistic ap- plication may be lower than the values reported in the next section. based on these results and the following section, the power consumption of pnx1300 series, using an artifi- cial scenario depicting an extremely demanding applica- tion, for commonly used speeds, is as follows: ? pnx1300/01/02 is < 3.4 w @ 166:133 mhz ? pnx1311 is < 2.9 w @ 166:133 mhz ? pnx1302 is < 4.0 w @ 200:133 mhz table 1-1. power consumption of example app lications for pnx1300/01/02 (vdd = 2.5v) applications after power optimizations without power optimizations optimizations unused peripherals turned off system speed adjustment idle task power management dvd playback 2.2 w 3.0 w @ 180 mhz 2.6 w @ 180 mhz 2.6 w @ 180 mhz 2.2 w @ 180 mhz h.263 vconf 1.7 w 2.9 w @ 166 mhz 2.7 w @ 166 mhz 1.9 w @ 111 mhz 1.7 w @ 111 mhz table 1-2. power consumption of example applications for pnx1311(vdd = 2.2v) applications after power optimizations without power optimizations optimizations unused peripherals turned off system speed adjustment idle task power management mpeg4 (cif) a/v playback 1.2 w 2.5 w @ 166 mhz 2.1 w @ 166 mhz 1.3 w @ 70 mhz 1.2 w @ 70 mhz h.263 vconf 1.5 w 2.4 w @ 166 mhz 2.2 w @ 166 mhz 1.7 w @ 111 mhz 1.5 w @ 111 mhz
philips semiconductors pin list preliminary specification 1-15 1.9.7.2 pnx1300/01/02 dspcpu core current and power consumption notes: 1. consumption for pnx1300/01/02 is organi zed in several categories. the ?typ? colu mn shows current consumption for a typ - ical application with a cpi (clocks per inst ruction) of 1.4. the ?max? column prov ides current consumption for an application with a cpi of 1.1. the measurements we re taken with all the peripheral units turned on (peripherals run on a random data pattern at the specified frequencies, except for vo which r uns at 27 mhz). this ?max? data represnts an application that heavily uses the dspcpu and does not reflect a realistic applicati on; it is used to determine peak currents. the ?typ? mea- surements reflect real applications. t he ?pwd? column shows current consumption when global powerdown mode is acti- vated. see chapter 21, ?power management.? 2. standby rows indicate current consumpt ion when dspcpu is maintained under reset (see section 11.6.5, ?biu_ctl register? ), all peripherals turned off (i.e. not enabled) and all peripher als powered down (+ bpwd row). 3. measurements accuracy is +/- 5% . measurements are done with vdd set to 2.5v and vcc set to 3.3v. 4. currents do not scale with frequency unl ess the cpu to sdram ratio is maintai ned. as an example, the data for cpu to sdram ratio 1:1 for 183:183 mhz can be calculated by usi ng the data from the 143:143 mhz column, and scaling the cur- rents by a factor of 1.279. 1.9.7.3 pnx1311 dspcpu core current and power consumption details notes: 1. consumption for pnx1311 is organized in several categorie s. the ?typ? column shows current consumption for a typical application with a cpi (clocks per instruct ion) of 1.4. the ?max? column provides current consumption for an application with a cpi of 1.1. the measurements were taken with all the per ipheral units turned on (peripher als run on a random data pattern at the specified frequencies, except for vo which runs at 27 mhz). this ?max? dat a represnts an application that heavily uses the dspcpu and does not reflect a realistic app lication; it is used to determine peak currents. the ?typ? measurements reflect real applications. the ?pwd? column shows current consumption when global powerdown mode is activated. see chapter 21, ?power management.? 2. standby rows indicate current consumpt ion when dspcpu is maintained under reset (see section 11.6.5, ?biu_ctl register? ), all peripherals turned off (i.e. not enabled) and all peripher als powered down (+ bpwd row). 3. measurements accuracy is +/- 5% . measurements are done with vdd set to 2.2v and vcc set to 3.3v. 4. currents do not scale with frequency unl ess the cpu to sdram ratio is maintained. pnx1300 143:143 pnx1301 166:133 pnx1302 192:144 pnx1302 200:133 symbol current/notes pwd typ max pwd typ max pwd typ max pwd typ max units pnx130x (note 1) i dd 225 1125 1200 250 1200 1300 300 1380 1475 300 1400 1525 ma i cc 40 125 135 40 120 135 40 130 135 36 125 130 ma total power dissipa- tion 0.8 3.2 3.5 0.8 3.4 3.7 0.9 3.9 4.1 0.9 4.0 4.2 w i dd , dspcpu only - 820 920 - 900 1030 - 1030 1200 - 1050 1250 ma i cc , dspcpu only - 55 45 - 50 45 - 55 45 - 55 45 ma power dspcpu only - 2.2 2.5 - 2.4 2.7 - 2.8 3.1 - 2.8 3.3 w pnx130x (note 1,2) i dd , standby - 550 - - 615 - - 720 - - 740 - ma power standby - 1.5 - - 1.7 - - 1.9 - - 2.0 - w i dd , standby + bpwd - 405 - - 450 - - 525 - - 540 - ma power standby + bpwd - 1.1 - - 1.2 - - 1.4 - - 1.5 - w pnx1311 100:100 pnx1311 143:143 pnx1311 166:166 pnx1311 166:133 symbol current/notes pwd typ max pwd typ max pwd typ max pwd typ max units pnx131x (note 1) i dd 129 670 720 185 955 1025 215 1110 1200 200 1032 1100 ma i cc 28 87 100 40 125 140 46 145 170 37 123 130 ma total power dissipa- tion 0.4 1.8 1.9 0.5 2.5 2.7 0.6 2.9 3.2 0.6 2.7 2.9 w i dd , dspcpu only - 490 550 - 700 785 - 815 915 - 756 880 ma i cc , dspcpu only - 38 31 - 55 45 - 65 55 - 50 45 ma power dspcpu only - 1.2 1.3 - 1.7 1.9 - 2.0 2.2 - 1.8 2.1 w pnx131x (note 1,2) i dd , standby - 325 - - 460 - - 535 - - 518 - ma power standby - 0.8 - - 1.1 - - 1.3 - - 1.3 - w i dd , standby + bpwd - 240 - - 340 - - 395 - - 375 - ma power standby + bpwd - 0.6 - - 0.9 - - 1.0 - - 0.9 - w
pnx1300/01/02/11 data book philips semiconductors 1-16 preliminary specification 1.9.7.4 pnx1300/01/02 current consumption for on-chip peripherals notes: 1. pwd. column for peripher al units indicates current savi ngs when block powerdown is activated compared to when it is id le. see chapter 21, ?power management? for block powerdown activation. 2. typ. column for peripheral units indicates current required when data pattern is random. the max. column indicates current ratings when data is switching from high to low level each cycl e. again that max. column is to show peak current and does not represent a real application. for both columns the current reported is the current required by the peripheral as well as the internal bus and mmi to transfer the data to/from the peripheral unit. 3. some currents are not reported due to the difficulty to measure it or because they are not relevant. for example ssi current is difficult to measure because it heavily involves the dspcpu and thus makes it almost impo ssible to separate the current consumed by the ssi or the dspcpu. 4. measurements accuracy is +/- 5%. measurements are done with vdd set to 2.5v and vcc set to 3.3v. 5. currents do not scale with fr equency if the cpu:sdram ratio are di fferent. same ratio must be used. pnx1300 143:143 pnx1301 166:133 pnx1302 192:144 pnx1302 200:133 symbol current/notes pwd typ max pwd typ max pwd typ max pwd typ max units vo 27 mhz i dd , running raw mode 50 28 39 55 29 38 65 16 26 72 27 36 ma i cc , running raw mode - 9 17 - 12 17 - 12 17 - 12 17 ma vo 81 mhz i dd , running raw mode - 23 75 - 33 54 - 30 58 - 47 72 ma i cc , running raw mode - 33 51 - 37 51 - 36 52 - 36 52 ma vi 27 mhz i dd , running raw mode 6 8 18 6 6 18 7 8 18 7 6 18 ma i cc , running raw mode - 7 14 - 6 14 - 8 15 - 9 15 ma ao 44 khz i dd , stereo 16-bit 231131134533ma i cc , stereo 16-bit - 2 1 - 1 1 - 1 1 - 1 1 ma ai 44 khz i dd , stereo 16-bit 122133132133ma i cc , stereo 16-bit - 1 1 - 1 1 - 1 1 - 1 1 ma spdif 48 khz i dd running pcm audio 2 3 2 2 3 1 3 3 3 4 2 2 ma i cc running pcm audio - 3 3 - 2 2 - 2 2 - 2 2 ma icp i dd , mem. block move 61 95 176 67 95 170 80 105 188 86 106 184 ma i cc , mem. block move - 28 28 - 27 54 - 30 61 - 29 59 ma pci 33 mhz i dd , dma transfer - 37 83 - 34 80 - 32 83 - 40 53 ma i cc , dma transfer - 58 102 - 58 102 - 58 104 - 58 82 ma vld i dd 3- -5- -6- -6- -ma i cc ------------ma ssi 10 mhz i dd 4- -5- -6- -6- -ma i cc ------------ma dvdd i dd 18 - - 21 - - 24 - - 24 - - ma i cc ------------ma
philips semiconductors pin list preliminary specification 1-17 1.9.7.5 pnx1311 current consumption for on-chip peripherals notes: 1. the ?pwd? column for peripheral units indicates current savings when block powerdown is activated, compared to when it is idle. see chapter 21, ?power management? for block powerdown activation. 2. the ?typ? column for peripheral units indicates current required w hen data pattern is random. the ?max? column indicates current ratings when data is switching from high to low level each cycle. again that ?max? column is to show peak current and does not represent a real application. fo r both columns the current reported is the current required by the peripheral as well as the internal bus and mmi to tr ansfer the data to/from the peripheral unit. 3. some currents are not reported due to the difficulty to measure it or bec ause they are not relevant . for example ssi current is difficult to measure because it heavily involves the dspcpu and thus makes it al most impossible to separate the current consumed by the ssi or the dspcpu. 4. measurements accuracy is +/- 5% . measurements are done with vdd set to 2.2v and vcc set to 3.3v. 5. currents do not scale with fre quency if the cpu:sdram ratio are diff erent. same ratio must be used. pnx1311-100:100 pnx1311-143:143 pnx1311-166:166 pnx1311-166:133 symbol current/notes pwd typ max pwd typ max pwd typ max pwd typ max units vo 27 mhz i ddl , running raw mode 33 17 23 47 25 33 56 29 38 48 24 31 ma i cc , running raw mode - 8 12 - 12 17 - 14 20 - 25 17 ma vo 81 mhz i ddl , running raw mode - 14 31 - 20 44 - 23 51 - 33 54 ma i cc , running raw mode - 25 36 - 36 52 - 42 60 - 37 51 ma vi 27 mhz i ddl , running raw mode 3 5 8 5 7 11 6 8 13 5 7 15 ma i cc , running raw mode - 6 10 - 9 15 - 10 17 - 8 15 ma ao 44 khz i ddl , stereo 16-bit 421632732122ma i cc , stereo 16-bit - 1 1 - 1 1 - 1 1 - 1 1 ma ai 44 khz i ddl , stereo 16-bit 111122122123ma i cc , stereo 16-bit - 1 1 - 1 1 - 1 1 - 1 1 ma spdif 48 khz i ddl running pcm audio 2 2 1 3 3 2 3 3 2 2 2 2 ma i cc running pcm audio - 1 1 - 2 2 - 2 2 - 2 2 ma icp i ddl , mem. block move 40 55 101 57 79 144 66 92 167 60 76 136 ma i cc , mem. block move - 19 38 - 27 55 - 31 64 - 26 54 ma pci 33 mhz i ddl , dma transfer - 17 36 - 25 51 - 29 59 - 20 50 ma i cc , dma transfer - 41 57 - 58 82 - 67 95 - 45 81 ma vld i ddl 3--4--5--4--ma i cc ------------ma ssi 10 mhz i ddl 2--3--3--4--ma i cc ------------ma dvdd i ddl 11--16--19--18--ma i cc ------------ma
pnx1300/01/02/11 data book philips semiconductors 1-18 preliminary specification 1.9.7.6 strg3, strg5 type i/o circuit 1.9.7.7 norm3 type i/o circuit 1.9.7.8 weak5 type i/o circuit 1.9.7.9 iicod (i 2 c) type i/o circuit pnx1300/01/02/11 symbol parameter condition/notes min. nominal max units v oh output high voltage i out = 16.0 ma 0.9v cc v v ol output low voltage i out = -16.0 ma 0.1v cc v z oh output ac impedance high level output state 11 ohm z ol output ac impedance low level output state 11 ohm t r output rise time test load of figure 1-1 .2.0ns t r output fall time test load of figure 1-1 .2.0ns pnx1300/01/02/11 symbol parameter condition/notes min. nominal max. units v oh output high voltage i out = 8.0 ma 0.9v cc v v ol output low voltage i out = -8.0 ma 0.1v cc v z oh output ac impedance high level output state 23 ohm z ol output ac impedance low level output state 23 ohm t r output rise time test load of figure 1-2 .4.0ns t r output fall time test load of figure 1-2 .4.0ns pnx1300/01/02/11 symbol parameter condition/notes min. nominal max. units v oh output high voltage i out = 6.0 ma 0.9v cc v v ol output low voltage i out = -6.0 ma 0.1v cc v z oh output ac impedance high level output state 33 ohm z ol output ac impedance low level output state 33 ohm t r output rise time test load of figure 1-3 .4.0ns t r output fall ti me test load of figure 1-3 .4.0ns symbol parameter condition/notes min. nominal max. units v il-iic input low voltage -0.5 1.0 v v ih-iic input high voltage vx is 3.3v or 5v depending on vref_periph value 2.3 vx+0.5 v v hys input schmitt trigger hysteresis 0.25 v v ol output low voltage i out = -6.0 ma 0.6 v t f output fall time 10 - 400 pf load 1.5 250 ns
philips semiconductors pin list preliminary specification 1-19 1.9.7.10 sdram interface timing fo r pnx1300/01/02/11 speed grades. notes: 1. for best high speed sdram operation, 50-ohm ma tched pcb traces are recommended for all mm_xxx signals. use 27-33 ohm series terminator resistors close to pnx1300/01/02/11 in the mm_clk0 and mm_clk1 line only. 2. equal load circuit. mm_clk0 and mm_clk1 are matched output buffers. 3. the center of the two rising edges on mm_clk0 , mm_clk1 are used as the clock reference point. propagation delay guarantee is defi ned from 50% point of clock edge to 50% level on d/a/c. output hold time guarantee is defined from 50% point of cl ock edge to 50% level on d/a/c. 4. mm_clk0 is used as a reference clock. input setup time requirement is defined as dat a value 50% complete to 50% level on clock. input hold time requirement is defined as minimum time from 50% level on clock to 50% change on data. 1.9.7.11 pci bus timing the following specifications meet the pci spec ifications, rev. 2.1 for 33-mhz bus operation. notes: 1. see the timing m easurement conditions in figure 1-4 . 2. minimum times are measured at the pac kage pin with the load circuit shown in figure 1-8 . maximum times are measured with the load circuit shown in figure 1-6 and figure 1-7 . 3. reg# and gnt# are point-to-point signals and hav e different input setup time s. all other signals are bused. 4. see the timing meas urement conditions in figure 1-5 . 5. rst# is asserted and de-asserted asynchronously with respect to clk. 6. all output drivers are floated when rst# is active. 7. for the purpose of active/float timing measurements, the hi-z or ?off? state is defined to be when the total current deliver ed through the component pin is less than or equal to the leakage current specification. pnx1300 143 pnx1301 166 pnx1301 180 pnx1311 166 pnx1302 200 n o t e s symbol parameter min max min max min max min max min max units f sdram mm_clk frequency 143 166 166 166 183 mhz 1 t cs skew between mm_clk0, clk1 0.05 0.05 0.05 0.05 0.05 ns 2 t pd propagation delay of data, address, control 4.7 4.2 4.2 4.2 3.7 ns 3 t oh output hold time of data, address and control 1.5 1.5 1.5 1.5 1.5 ns 3 t su input data setup time 0 0 0 0 0 ns 4 t ih input data hold time 2.0 1.5 1.5 1.5 1.5 ns 4 symbol parameter min. max units notes t val-pci (bus) clk to signal valid delay, bused signals 2 11 ns 1,2,3 t val-pci (ptp) clk to signal valid delay, poi nt-to-point si gnals 2 12 ns 1,2,3 t on-pci float to active delay 2 ns 1 t off-pci active to float delay 28 ns 1,7 t su-pci input setup time to clk - bused signals 7 ns 3,4 t su-pci (ptp) input setup time to clk - point-to-point signals 12 ns 3,4 t h-pci input hold time from clk 0.2 1 1. pci clock skew between two pci devices must be lower than 1.8ns instead of the 2 ns as specified in pci 2.1 specification ns 4 t rst-pci reset active time after power stable 1 ms 5 t rst-clk-pci reset active time after clk stable 100 s5 t rst-off-pci reset active to output float delay 40 ns 5,6,7
pnx1300/01/02/11 data book philips semiconductors 1-20 preliminary specification 1.9.7.12 jtag i/o timing notes: 1. see the timing m easurement conditions in figure 1-10 . 2. see the timing measurement conditions in figure 1-9 . 1.9.7.13 i 2 c i/o timing notes: 1. see the timing m easurement conditions in figure 1-11 . 2. see the timing measurement conditions in figure 1-12 . 3. see the timing measurement conditions in figure 1-13 . 4. see the timing measurement conditions in figure 1-14 . 5. see the timing measurement conditions in figure 1-15 . 1.9.7.14 video in i/o timing notes: 1. see the timing m easurement conditions in figure 1-16 . 1.9.7.15 video out i/o timing notes: 1. see the timing m easurement conditions in figure 1-17 . 2. see the timing measurement conditions in figure 1-18 . 3. clkout asserted, i.e. the vo unit is the source of vo_clk 4. clkout negated, i.e. the external world is the source of vo_clk symbol parameter min. max units notes f jtag-clk jtag clock frequency 20 mhz t clk-tdo jtag_tck to jtag_tdo valid delay 2 10 ns 1 t su-tck input setup time to jtag_tck 3 ns 2 t h-tck input hold time from jtag_tck 7 ns 2 symbol parameter min. max units notes f scl scl clock frequency 400 khz 1 t buf bus free time 1 s2 t su-sta start condition set up time 1 s3 t h-sta start condition hold time 1 s3 t low scl low time 1 s1 t high scl high time 1 s1 t f scl and sda fall time (cb = 10-400 pf, from v ih-iic to v il-iic ) 20+0.1cb 250 ns 1 t su-sda data setup time 100 ns 4 t h-sda data hold time 0 ns 4 t dv-sda scl low to data out valid 0.5 s5 t dv-sto scl high to data out 1 ns 5 symbol parameter min. max units notes f vi-clk video in clock frequency 81 mhz t su-clk input setup time to vi_clk 2 ns 1 t h-clk input hold time from vi_clk 2 ns 1 symbol parameter min. max units notes f vo-clk video out clock frequency 81 mhz t clk-dv vo_clk to vo_data (or vo_io*) out 3 7.5 ns 1,3 t clk-dv vo_clk to vo_data (or vo_io*) out 3 7.5 ns 1,4 t su-clk vo_io* setup time to vo_clk 10 ns 2 t h-clk vo_io* hold time from vo_clk 3 ns 2
philips semiconductors pin list preliminary specification 1-21 1.9.7.16 audioin i/o timing notes: 1. see the timing m easurement conditions in figure 1-19 . 2. the timing measurements are done with respect to the clock edge according to clock_edge 3. ser_master asserted, i.e. audio in is the source of ai_ws. see the timing measurement condition in figure 1-20 . 1.9.7.17 audio out i/o timing notes: 1. see the timing m easurement conditions in figure 1-21 . 2. see the timing meas urement conditions in figure 1-23 . 3. the timing measurements are done with respect to the ao_sck clock edge according to clock_edge 4. pnx1300/01/02/11 is the serial interfac e master, i.e. ao_sck, ao_ws are outputs 5. pnx1300/01/02/11 is serial interface slave, i.e. ao_sck, ao_ws are inputs 6. see the timing meas urement conditions in figure 1-22 . 1.9.7.18 ssi i/o timing notes: 1. interrupt latency limits ssi to a pr actical use at a bit rate of 1.5 mbit/sec. 2. see the timing meas urement conditions in figure 1-24 . 3. see the timing meas urement conditions in figure 1-25 . symbol parameter min. max units notes f ai-sck audio in ai_sck clock frequency 22 mhz t su-sck input setup time to ai_sck 3 ns 1,2 t h-sck input hold time from ai_sck 2 ns 1,2 t sck-ws ai_sck to ai_ws 10 ns 3 symbol parameter min. max units notes f ao-sck audio out ao_sck clock frequency 22 mhz t sck-dv ao_sck to ao_sdx valid 2 12 ns 1,3,4 t sck-dv ao_sck to ao_sdx valid 2 12 ns 1,3,5 t su-sck input setup time to ao_sck 4 ns 2,3,5 t h-sck input hold time from ao_sck 2 ns 2,3,5 t sck-ws ao_sck to ao_ws 10 ns 3,4,6 symbol parameter min. max units notes f ssi-clk ssi_clk clock frequency 20 mhz 1 t clk-dv ssi_clk to data valid 2 12 ns 2 t su-clk input setup time to ssi_clk 3 ns 3 t h-clk input hold time from ssi_clk 2 ns 3
pnx1300/01/02/11 data book philips semiconductors 1-22 preliminary specification figure 1-1. strg3, strg5 test load circuit 12 pf output buffer rise/fall test point 2? true length 50-ohm 30-ohm pnx1300 pin figure 1-2. norm3 test load circuit 30 pf output buffer rise/fall test point 50-ohm pnx1300 pin 2? true length figure 1-3. weak5 test load circuit 15 pf output buffer rise/fall test point 50-ohm pnx1300 pin 2? true length v_test t_on t_off v_trise v_tfall t_fval t_rval v_tl v_th clk output tri-state delay output output delay figure 1-4. pci output timing measurement con- ditions inputs v_test v_tl v_th clk input figure 1-5. pci input timing m easurement conditions v_th v_tl valid v_test v_test t_h t_su v_max 10 pf figure 1-6. pci t val (max) rising edge 1/2 in. max output 25 ? buffer pin 10 pf figure 1-7. pci t val (max) falling edge 1/2 in. max output 25 ? buffer pin vcc 10 pf figure 1-8. pci t val (min) and slew rate 1/2 in. max output 1k ? buffer pin 1k ? vcc tck tdi, tms figure 1-9. jtag input timing valid t h_tck t su_tck
philips semiconductors pin list preliminary specification 1-23 tck tdo figure 1-10. jtag output timing valid t clk_tdo scl figure 1-11. i 2 c i/o timing t high t low t r t f scl sda figure 1-12. i 2 c i/o timing t tbuf scl sda figure 1-13. i 2 c i/o timing t h_sta t su_sta scl sda figure 1-14. i 2 c i/o timing valid t h_sda t su_sda figure 1-15. i 2 c i/o timing scl sda valid t dv_sto t dv_sda vi_clk vi_data, vi_io figure 1-16. videoi n i/o timing valid t h_clk t su_clk figure 1-17. video out i/o timing vo_clk vo_data valid t clk_dv vo_clk vo_io figure 1-18. video out i/o timing valid t h_clk t su_clk ai_sck ai_sd, ai_ws figure 1-19. audio in i/o timing valid t h_sck t su_sck
pnx1300/01/02/11 data book philips semiconductors 1-24 preliminary specification figure 1-20. audio in i/o timing ai_sck ai_ws valid t sck_ws figure 1-21. audio out i/o timing ao_sck ao_sdx valid t sck_dv figure 1-22. audio out i/o timing ao_sck ao_ws valid t sck_ws ao_sck ao_ws figure 1-23. audio out i/o timing valid t h_sck t su_sck figure 1-24. ssi i/o timing ssi_clk ssi i/o valid t clk_dv ssi_clk ssi_io figure 1-25. ssi i/o timing valid t h_clk t su_clk
preliminary specification 2-1 overview chapter 2 by gert slavenburg 2.1 introduction in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 is a successor to the tm-1300, tm-1100 and tm-1000 media processors. for those familiar with the tm-1300, the new features s pecific to the pnx1300 are summarized in section 2.6 . for those familiar with the tm-1100, the new features s pecific to the pnx1300 are summarized in section 2.7 . for those familiar with the tm-1000, new features for the pnx1300 are summa- rized in section 2.8 . 2.2 pnx1300 fundamentals pnx1300 is a media processor for high-performance multimedia applications that deal with high-quality video and audio. these applications can range from low-cost, dedicated systems such as video phones, video editing, digital television, security systems or set-top boxes to re- programmable, multipurpose plug-in cards for personal computers. pnx1300 easily implements popular multi- media standards such as mpeg-1 and mpeg-2, but its orientation around a powerful general-purpose cpu (called the dspcpu) makes it capable of implementing a variety of multimedia algorithms, both open and propri- etary. pnx1300 is also easily configured in multiple pro- cessor configurations for very high-end applications. more than just an integrated microprocessor with unusu- al peripherals, the pnx1300 is a fluid computer system controlled by a small real-time os kernel running on a very-long instruction word (vliw) processor core. pnx1300 contains a dspcpu, a high-bandwidth inter- nal bus, and internal bus-mastering dma peripherals. software compatibility between current and fu ture trime- dia processor family members is at the source-code and library api level; binary compatibility between family members is not guaranteed. defining software compatibilit y at the source-code level gives philips the freedom to strike the optimum balance between cost and performance for all chips in the family. a powerful compiler and software development environ- ment ensure that programmers never need to resort to non-portable assembler programming. programmers use the library apis and multimedia operations from c and c++ source code. pnx1300 is designed both for use as an accelerator in a pc environment or as the sole cpu in cost-effective standalone systems. in standalone system applications, the pnx1300 external bus allows for glueless connection of 8-bit wide rom, eeprom , or flash memory for code storage. the external bus also allows intermixing of pci2.1 master/slave peripherals and 8-bit simple periph- erals, such as uarts and other 8-bit microprocessor pe- ripherals. this powerful external bus architecture gives system designers a variety of options to configure low- cost, high-performance system solutions. because it is based on a general-purpose cpu, pnx1300 can also serve as a multifunctional pc en- hancement vehicle. typically, a pc must deal with multi standard video and audio streams; and applications re- quire both decompression and compression. while the cpu chips used in pcs are becoming capable of low- resolution, real-time video decompression, high-quality decompression?not to mention compression?of stu- dio-resolution video is still out of reach. fu rther, users expect their systems to handle live video and audio with- out sacrificing system responsiveness. pnx1300 enhances a pc system by providing real-time multimedia with the advantages of a special-purpose, embedded solution?low cost and chip count? and the advantages of a general-purpose processor?repro- grammability. for pc app lications, pnx1300 far sur- passes the capabilities of fixed-function multimedia chips. future media proc essor family members will have differ- ent sets of interfaces appropriate for their intended use. 2.3 pnx1300 chip overview key features of pnx1300 include: ? a very powerful, general-purpose vliw processor core (the dspcpu) that coordinates all on-chip activities. in addition to implementing the non-trivial parts of multimedia algorithms, the dspcpu runs a small real-time operating system driven by interrupts from the other units. ? independent dma-driven multimedia i/o units that properly format data to make software media pro- cessing efficient. ? dma-driven multimedia c oprocessors that operate independently and in parallel with the dspcpu to perform operations specific to important multimedia algorithms.
pnx1300/01/02/11 data book philips semiconductors 2-2 preliminary specification ? a high-performance bus and memory system that provide communication between pnx1300?s pro- cessing units. ? a flexible external bus interface. figure 2-1 shows a pnx1300 block diagram. the bulk of a pnx1300 system consists of the pnx1300 micropro- cessor itself, external synchronous dram (sdram), and the external circuitry needed to interface to incoming and/or outgoing video and audio data streams and com- munication lines. pnx1300?s external peripheral bus can gluelessly interface to pc! 2.1 components and/or 8-bit microprocessor peripherals. figure 2-2 shows a possible minimally configured pnx1300 system. a video input stream might come di- rectly from a ccir 656-compliant video camera chip in yuv 4:2:2 format through a glueless interface in this case. an analog camera can be connected via a ccir 656 interface chip (such as the philips saa7113h). pnx1300 outputs a ccir656 video stream to drive a dedicated video monitor. stereo audio input and up to 8- channel audio output require only low-cost external adc and dac. the operation of t he video and audio interface units is highly customizable through programmable pa- rameters. the glueless pci interface allows the pnx1300 to dis- play video in a host pc?s video card. the image copro- cessor (icp) provides display support for live video input an arbitrary number of arbitrarily overlapped windows. pnx1300 video in audio in audio out i 2 c interface vld coprocessor video out timers synchronous serial interface image coprocessor vliw cpu 16k d$ 32k i$ ccir656 dig. video yuv 4:2:2 up to 81 mhz (40 mpix/sec) stereo digital audio 8 and 16-bit data i 2 s dc, up to 22 mhz ai_sck 2/4/6/8 ch. digital audio 16 and 32-bit data i 2 s dc, up to 22 mhz ao_sck i 2 c bus to camera, etc. huffman decoder slice-at-a-time mpeg-1 & 2 ccir656 digital video yuv 4:2:2 up to 81 mhz (40 mpix/sec) analog modem or isdn front end down & up scaling yuv rgb 50 mpix/sec pci-xio interface external bus - pc!2.1 (32 bits, 33-mhz) + glueless 24a/8d slaves sdram main memory interface dvdd spdif out iec958 up to 40 mbit/sec 32-bit data up to 572 mb/sec figure 2-1. pnx1300 block diagram. figure 2-2. pnx1300 system connections. a minimal pnx1300 requires few supporting components. pnx1300 ccir656 digital video 2mx32 sdram adc stereo audio in dac 2 - 8 ch audio out ccir656 dig. video jtag modem front end pci and 8-bit peripheral bus rom
philips semiconductors overview preliminary specification 2-3 finally, the synchronous seri al interface (ssi) requires only an external isdn or analog modem front-end chip and phone line interface to provide remote communica- tion support. it can be used to connect pnx1300-based systems for video phone or videoconferencing applica- tions, or it can be used for general-purpose data commu- nication in pc systems. the pnx1300 jtag port allows a debugger on a host system to access and control the state of a pnx1300 in a target system. it also implements 1149.1 boundary scan functionality. 2.4 brief examples of operation the key to understanding pnx1300 operation is observ- ing that the dspcpu and peripherals are time-shared and that communication between units is through sdram memory. the dspcpu switches from one task to the next; first it decompre sses a video frame, then it decompresses a slice of the audio stream, then back to video, etc. as necessary, the dspcpu issues com- mands to the peripheral function units to orchestrate their operation. the dspcpu can enlist the icp and other coprocessors to help with some of the straightforward, tedious tasks associated with video proce ssing. the icp is very well suited for arbitrary size horizontal and vertical video re- sizing and color space conversion. the dspcpu can enlist the input/output peripherals to autonomously receive or transmit digital video and audio data with minimal cpu supervision. the i/o units have been designed to interface to the outside world through industry standard audio and video interfaces, while deliv- ering or taking data in memory in formats suitable for software processing. 2.4.1 video decompression in a pc an example pnx1300 implementation is as a video-de- compression engine on a pci card in a pc. in this case, the pc does not need to know the pnx1300 has a pow- erful, general-purpose cpu; rather, the pc just treats the hardware on the pci card as a ?black-box? engine. video decompression begi ns when the pc operating system hands the pnx1300 a pointer to compressed vid- eo data in the pc?s memory (the details of the communi- cation protocol are handled by the software driver in- stalled in the pc?s operating system). the dspcpu fetches data from the compressed video stream via the pci bus, decompresses frames from the video stream, and places them into local sdram. de- compression may be aided by the vld (variable-length decoder) coprocessor unit, which implements huffman decoding and is cont rolled by the dspcpu. when a frame is ready for display, the dspcpu gives the icp a display command. the icp then autonomously fetches the decompressed frame data from sdram and transfers it over the pci bus to the frame buffer in the pc?s video display card. alternately, video can be sent to the graphics card using the vo unit. 2.4.2 video compression another typical application for pnx1300 is in video com- pression. in this case, un compressed video is usually supplied directly to the pnx1 300 system via the video in (vi) unit. a camera chip connected directly to the vi unit supplies yuv data in 8-bit, 4:2:2 format. the vi unit sam- ples the data from the camera chip and demultiplexes the raw video to sdram in three separate areas, one each for y, u, and v. when a complete video frame has been read from the camera chip by the vi unit, it interrupts the dspcpu. the dspcpu compresses the video data in software (using a set of powerful data-parallel multimedia operations) and writes the compressed data to a separate area of sdram. the compressed video data can now be transmitted or stored in any of several ways. it can be sent to a host system over the pci bus for archival on local mass stor- age, or the host can transfer the compressed video over a network. the data can also be sent to a remote system using the modem/isdn interfac e to create, for example, a video phone or videoconferencing system. since the powerful, general- purpose dspcpu is avail- able, the compressed data can be encrypted before be- ing transferred for security. 2.5 introduction to pnx1300 blocks the remainder of this chapter provides a brief introduc- tion to the internal components of pnx1300. 2.5.1 internal ?data highway? bus the internal bus (or data highway) connects all internal blocks together and provides access to internal control/ status registers of each block, external sdram, and the external bus peripheral chips. the internal bus consists of separate 32-bit data and address buses. transactions on the bus use a block-transf er protocol. on-chip periph- eral units and coprocessors can be masters or slaves on the bus. access to the internal bus is controlled by a central arbi- ter, which has a request line from each potential bus master. the arbiter is programmable so that the arbitra- tion algorithm can be tailored for different applications. peripheral units make requests to the arbiter for bus ac- cess and, depending on the arbitration mode, bus band- width is allocated to the unit s in different amounts. each mode allocates bandwidth differently, but each mode guarantees each unit a minimum bandwidth and maxi- mum service latency. all unused bandwidth is allocated to the dspcpu. the bus allocation mechanism is one of the features of pnx1300 that makes it a true real-time system instead of just a highly integrated mi croprocessor with unusual pe- ripherals.
pnx1300/01/02/11 data book philips semiconductors 2-4 preliminary specification 2.5.2 vliw processor core the heart of pnx1300 is a powerful 32-bit dspcpu core. the dspcpu implements a 32-bit linear address space and 128, fully general-purpose 32-bit registers. the registers are not separated into banks; any opera- tion can use any register for any operand. the pnx1300 core uses a vliw instruction-set architec- ture and is fully general-purpose. the vliw instruction length allows five simulta neous operations to be issued every clock cycle. these operations can target any five of the 27 functional units in the dspcpu, including inte- ger and floating-point arithmetic units and data-parallel multimedia operation units. although the processor core runs a real-time operating system to coordinate all ac tivities in the pnx1300 sys- tem, the core is not intended for true general-purpose computer use. for example, the pnx1300 processor core does not implement demand-paged virtual memory, memory address translation, or 64-bit floating point - all essential features in a general-purpose computer sys- tem. pnx1300 uses a vliw architecture to maximize proces- sor throughput at the lowest possible cost. vliw archi- tectures have performance exceeding that of supersca- lar general-purpose cpus without the cost and complexity of a superscala r cpu implementation. the hardware saved by eliminating superscalar logic reduces cost and allows the integrat ion of multimedia-specific features that enhance the po wer of the processor core. the pnx1300 operation set includes all traditional micro- processor operations. in addition, multimedia operations are included that dramatically accelerate standard video and audio compression and decompression algorithms. as just one of the five operations issued in a single pnx1300 instruction, a single ?custom? or ?media? opera- tion can implement up to 11 traditional microprocessor operations. these multimedia operations combined with the vliw architecture result in tremendous throughput for multimedia applications. the dspcpu core is supported by separate 16-kb data and 32-kb instruction caches. the data cache is dual- ported to allow two simultaneous accesses; both caches are 8-way set-associative with a 64-byte block size. 2.5.3 video in unit the video in (vi) unit interfaces directly to any ccir 601/ 656-compliant device that outputs 8-bit parallel, 4:2:2 yuv time-multiplexed data. such devices include direct digital camera systems, which can connect gluelessly to pnx1300 or through the standard ccir 656 connector with only the addition of ecl level converters. a single chip external device can be used to convert to/from serial d1 professional video. non- ccir-compliant devices can use a digital video decoder chip, such as the philips saa7113h, to interface to pnx1300. the vi unit demultiplexes the captured yuv data before writing it into local pnx 1300 sdram. separate planar data structures are maintained for y, u, and v. the vi unit can be programmed to perform on-the-fly horizontal resolution subsampling by a factor of two if needed. many camera systems capture a 640-pixel/line or 720-pixel/line image. with subsampling, direct conver- sion to a 320-pixel/line or a 360-pixel/line image can be performed with no dspcpu intervention. performing this function during video input reduces initial storage and bus bandwidth requirements for applications requiring reduced resolution. 2.5.4 enhanced video out unit the enhanced video out (evo) unit essentially per- forms the inverse function of the vi unit. evo generates an 8-bit, ccir656 digital video data stream that contains a composited video and graphics overlay image. the vid- eo image is taken from separate y, u, and v planar data structures in sdram. the graphics overlay is taken from a pixel-packed yuv data structure in sdram. compos- iting allows both alpha-blending and chroma keying. the evo unit can also upscale the video image horizon- tally by a factor of two to convert from cif/sif to ccir 601 resolution. the overlay image, if enabled, is always in full-pixel resolution. the evo unit is capable of pixel emission rates up to 40 mpix/sec and allows full programming of a horizontal and vertical frame/field structure. it is thus capable of refresh- ing both interlaced and non-interlaced (?two f h ?) video dis- plays with 4:3 or 16:9 or other aspect ratios. the sample rate for evo unit pixels is independently and dynamically programmable. the high-quality, on-chip sample clock generator circuit allows the programmer subtle control over the sampling frequency so that audio and video synchronization can be achieved in any sys- tem configuration. when changing the sample frequen- cy, the instantaneous phase does not change, which al- lows sample frequency manipulation without introducing audio or video distortion. 2.5.5 image coprocessor the icp off-loads common image scaling or filtering tasks from the dspcpu. although these tasks can be easily performed by the ds pcpu, they are a poor use of the relatively expensive cpu resource. when performed in parallel by the icp, these tasks are performed effi- ciently by simple hardware, which allows the dspcpu to continue with more complex tasks. the icp can operate as either a memory-to-memory or a memory-to-pci coprocessor device. in memory-to-memory mode, the icp can perform either horizontal or vertical image filtering and resizing. a high quality algorithm is used (5-tap polyphase filter in each direction). filtering or scaling is done in either the hori- zontal or vertical directio n in one pass. two invocations of the icp are required to filter or resize in both direc- tions. in memory-to-pci mode, the icp can perform horizontal resizing followed by color- space conversion. for exam- ple, assume an n m pixel array is to be displayed in a
philips semiconductors overview preliminary specification 2-5 window on the pc video screen while the pc is running a graphical user interface. the first step (if necessary) would use the icp in memory-to-memory mode to per- form a vertical resizing. th e second step would use the icp in memory-to-pci mode to perform horizontal resiz- ing and optional colorspace conversion from yuv to rgb. while sending the final, resampled and converted pixels over the pci bus to the video frame buffer, the icp uses a full, per-pixel occlusion bi t mask?accessed in destina- tion coordinates?to determi ne which pixels are actually written to the graphics card frame buffer for display. con- ditioning the transfer with the bit mask allows pnx1300 to accommodate an arbitrary arrangement of overlap- ping windows on the pc video screen. figure 2-3 illustrates a possible di splay situat ion and the data structures in sdram that support icp operation. on the left, the pc video screen has four overlapping windows. two, image 1 and image 2, are being used to display video generated by pnx1300. the right side shows a conceptual view of sdram contents. two data structures are present, one for image 1 and the other for image 2. figure 2-3 represents a point in time during which the icp is displaying image 2. when the icp is displaying an image (i.e., copying it from sdram to a frame buffer), it maintains four pointers to the sdram data structures. three pointers locate the y, u, and v data arrays, the fourth locates the per-pixel oc- clusion bit map. the y, u, and v arrays are indexed by source coordinates while the occlusion bit map is ac- cessed with screen coordinates. as the icp generates pixels for display, it performs hori- zontal scaling and colorspace conversion. the final rgb pixel value is then copied to the destination address in the screen?s frame buffer only if the corresponding bit in the occlusion bit map is a ?1?. as shown in the conceptual diagram, the occlusion bit map has a pattern of 1s and 0s corresponding to the shape of the visible area of the destination window in the frame buffer. when the arrangement of windows on the pc screen changes, modifications to the occlusion bit map is performed by pnx1300 or host resident software. it is important to note that there is no preset limit on the number and sizes of windows that can be handled by the icp. the only limit is the available bandwidth. thus, the icp can handle a few large windows or many small win- dows. the icp can sustain a transfer rate of 50 megapix- els per second, which is more than enough to saturate pci when transferring images to video frame buffers. 2.5.6 variable-length decoder (vld) the variable-length decoder (vld) relieves the dspcpu of decoding huffman-encoded video data streams. it can be used to help decode high bitrate mpeg-1 and mpeg- 2 video streams. the lower bitrate of videoconferencing can be adequately handled by dspcpu software with- out coprocessor. the vld is a memory-to-memory coprocessor. the dspcpu hands the vld a pointer to a huffman-encod- ed bit stream, and the vld produces a tokenized bit stream that is very convenient for the pnx1300 image decompression software to use. the format of the output token stream is optimized for the mpeg-2 decompres- sion software so that communication between the dspcpu and vld is minimized. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 pc screen image 1 file edit format view file edit framemaker 5 image 1 calendar in sdram image 2 y u v y u v 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 image 1 image 2 icp figure 2-3. icp - windows on the pc screen and data structures in s dram for two live video windows.
pnx1300/01/02/11 data book philips semiconductors 2-6 preliminary specification 2.5.7 audio in and audio out units the audio in (ai) and audio out (ao) units are similar to the video units. they connec t to most serial adc and dac chips, and are programmable enough to handle most serial bit protocols. these units can transfer msb or lsb first and left or right channel first. the audio sampling clock is driven by pnx1300 and is software programmable within a wide range. like the vo unit, ai and ao sample rates are separately and dynam- ically programmable. the high-quality on-chip sample clock generator circuits allows the programmer subtle control over the sampling frequency so that audio and video synchronization can be achieved in any system configuration. when changing the sample frequency, the instantaneous phase does not change, which allows sample frequency manipulation without introducing au- dio or video distortion. as with the video units, the audio-in and audio-out units buffer incoming and outgoing audio data in sdram. the audio-in unit buffers samples in either 8- or 16-bit format, mono or stereo. the audio-out unit transfers 16- or 32-bit sample data for mono, stereo or up to 8 audio channels from memory to the external dacs. any manipulation or mixing of sound data is pe rformed by the dspcpu since this processing will require only a small fraction of its pro- cessing capacity. 2.5.8 s/pdif out unit the sony/philips digital inte rface out (spdo) unit al- lows output of a 1-bit high-speed serial data stream. the primary application is output of digital audio data in sony/ philips digital interface (s/p dif) format to an external electrically isolated transformer. the spdo unit can also be used as a general purpose high-speed data stream output device such as a uart. the spdo unit supports 2-channel pcm audio, one or more dolby digital six-channel data streams, or one or more mpeg-1 or mpeg-2 audio streams (embedded per project 1937). it supports arbitrary programmable sample rates independent of and asynchronous to the ao unit sample rate. 2.5.9 synchronous serial interface the on-chip synchronous serial interface (ssi) is spe- cially designed to interface to high integration analog mo- dem frontends or isdn frontend devices. in the analog modem case, all of the modem signal processing is per- formed in the pnx1300 dspcpu. 2.5.10 i 2 c interface the i 2 c bus is a 2-wire multi- master, multi- slave inter- face capable of transmitting up to 400 kbit/sec. pnx1300 implements an i 2 c master for use in single master envi- ronments only. this interface allows pnx1300 to config- ure and inspect the status of i 2 c peripheral devices, such as video decoders, video encoders and some camera types. 2.6 new in pnx1300 (versus tm-1300) pnx1300/01/02/11 offers the following improvements over the tm-1300: ? lower core voltage for pnx1311 (2.2v core voltage) and therefore lower power consumption. ? dspcpu speed of up to 200 mhz for pnx1302. ? support for 256 mbit sdram organized in x16. the refresh counter must be changed. refer for sec- tion 12.11, ?refresh? in chapter 12, ?sdram mem- ory system? for details. ? support for 16 and 32-bit main memory interface. ? bug fixes in vi message passing mode. ? additional vi mode where vi_data[9:8] in message passing mode are not affected by the vi_dvalid signal. ? pci bug fix on pci special cycles. ? autonomous boot in non 1:1 ratio is fixed. 2.7 new in pnx1300 (versus tm-1100) in addition to the features described in section 2.6 pnx1300 offers also the following improvements over the tm-1100: ? no external matchout to matchin delay line. ? video output speed improvement: up to 81 mhz. ? video input speed improvement: up to 81 mhz. ? prefetcheable sdram aperture to increase perfor- mance. see chapter 11, ?pci interface.? ? individual powerdown cap ability for each coproces- sor (e.g. icp, evo, etc.). ? new ao coprocessor with four separate channels and support of 16 or 32-bit samples. 8-bit samples are no longer supported. ? new spdo coprocessor (for output of spdif and other 1-bit high-speed serial data streams) 2.8 new in pnx1300 (versus tm-1000) in addition to the features described in section 2.7 pnx1300 offers also the following improvements over the tm-1000: ? new dspcpu in structions. see appendix a, ?pnx1300/01/02/11 dspcpu operations.? ? video output unit improvements (8-bit alpha blend- ing, chroma keying, genlock). see chapter 7, ?enhanced video out.? ? capability to intermix pci2.1 and 8-bit peripherals or rom/flash memories on the external bus. see chapter 22, ?pci-xio external i/o bus.? ? an on-chip dvd authentic ation/descrambling copro- cessor. information available to dvd product devel- opers on special request. ? full 1149.1 boundary scan. ? improved pci dma read performance. see chapter 11, ?pci interface.? ? improved clock generation with new dds blocks.
preliminary specification 3-1 dspcpu architecture chapter 3 by gert slavenburg, marcel janssens 3.1 basic architecture concepts in the document the generic pnx1300 product name refers to pnx1300 series, or the pnx1300/01/02/11 products. this section documents the system programmer or ?bare-machine? view of the pnx1300 cpu (or dspcpu). 3.1.1 register model figure 3-1 shows the dspcpu?s 128 general purpose registers, r0...r127. in addition to the hardware program counter, pc, there are 4 user -accessible special purpose registers, pcsw, dpc (destination program counter), spc (source program counter), and cccount. table 3-1 lists the registers and their purposes. register r0 always contains the integer value '0', corre- sponding to the boolean value 'false' or the single-pre- cision floating point value +0.0. register r1 always con- tains the integer value '1' ('true'). the programmer is not allowed to write to r0 or r1. note: writing to r0 or r1 may cause reads from r0 or r1 scheduled in adjacent cl ock cycles to return unpre- dictable values. the standard assembler prevents/ forbids the use of r0 or r1 as a destination register. registers r2 through r127 are true general purpose reg- isters; the hardware does not imply their use in any way, though compiler or programmer conventions may assign particular roles to particul ar registers. the dpc and spc relate to interrupt and exception handling and are treated in section 3.1.4, ?spc and dpc?source and destina- tion program counter.? the pcsw (program control and status word) register is treated in section 3.1.3, ?pcsw overview.? cccount, the 64-bit clock cycle counter is treated in section 3.1.5, ?cccount?clock cycle counter.? 31 23 15 7 0 0 0 0 0 1 0 00000000000000000000000000000 00000000000000000000000000000 31 23 15 7 0 63 55 47 39 r0 r1 r2 r3 r126 r127 pc pcsw dpc spc cccount 128 general-purpose registers ? r0 & r1 fixed ? r2?r127 variable system status & control registers ? ? ? ? ? ? figure 3-1. pnx1300 registers. table 3-1. dspcpu registers register size details r0 32 bits always reads as 0x0; must not be used as destination of operations r1 32 bits always reads as 0x1; must not be used as destination of operations r2?r127 32 bits 126 general -purpose registers pc 32 bits program counter pcsw 32 bits program control & status word dpc 32 bits destination program counter; latches target of taken branch that is interrupted spc 32 bits source program counter; latches tar- get of taken branch that is not inter- rupted cccount 64 bits counts clock cycles since reset
pnx1300/01/02/11 data book philips semiconductors 3-2 preliminary specification 3.1.2 basic dspcpu execution model the dspcpu issues one ?long instruction? every clock cycle. each instruction consists of several operations (five operations for the pnx1300 microprocessor). each operation is comparable to a risc machine instruction, except that the execution of an operation is conditional upon the content of a general purpose register. exam- ples of operations are: if r10 iadd r11 r12 r13 (if r10 true, add r11 and r12 and write sum in r13) if r10 ld32d(4) r15 r16 (if r10 true, load 32 bits from mem[r15+4] into r16) if r20 jmpf r21 r22 (if r20 true and r21 false, jump to address in r22) each operation has a specific, known execution latency in clock cycles. for example, iadd takes 1 cycle; thus the result of an iadd operation started in clock cycle i is avail- able for use as an argument to operations issued in cycle i+1 or later. the other operations issued in cycle i cannot use the result of iadd. the ld32d operation has a latency of 3 cycles. the result of an ld32d operation started in cy- cle j is available for use by other operations issued in cy- cle j+3 or later. branches, such as the jmpf example above have three delay slots. this means that if a branch operation in cycle k is taken, all operat ions in the instruc- tions in cycle k+1, k+2 and k+3 are still executed. in the above examples, r10 and r20 control conditional execution of the operations. also known as ?guarding?, here r10 and r20 contain the operation ?guard?. see sec- tion 3.2.1, ?guarding (conditional execution).? certain restrictions exist in the choice of what operations can be packed into an instruction. for example, the dspcpu in pnx1300 allows no more than two load/ store class operations to be packed into a single instruc- tion. also, no more than five results (of previously started operations) can be written during any one cycle. the packing of operations is not normally done by the pro- grammer. instead, the instruction scheduler (see philips trimedia sde reference manual) takes care of convert- ing the parallel intermediate format code into packed in- structions ready for the assembler. the rules are formally described in the machine description file used by the in- struction scheduler and other tools. 3.1.3 pcsw overview figure 3-2 shows the pcsw register. the pnx1300 val- ue of pcsw on reset is 0x80 0. for compatibility, any un- defined pcsw fields should never be modified. note that the dspcpu architecture has no condition codes or integer arithmetic status flags. integer opera- tions that generate out-of-range results deliver an opera- tion specific bit pattern. for examples, see dspiadd in appendix a, ?pnx1300/01/02/11 dspcpu operations.? predicate operations exist that take the place of integer status flags in a classical architecture. multiword arith- metic is supported by the ?c arry? operation which gener- ates a ?0? or ?1? depending on the carry that would be gen- erated if its arguments were summed. fp-related fields. the ieee mode field determines the ieee rounding mode of all floating point operations, with the exception of a few floating point conversion opera- tions that use fixed roundi ng mode. for examples, see if- ixrz , ifloatrz , ifixrz , ifloatrz in appendix a, ?pnx1300/01/ 02/11 dspcpu operations.? the fp exception flags are ?sticky bits? that are set as a side effect of floating-point computations. each floating point operation can set one or mo re of the flags if it incurs the corresponding exception. the flags can only be reset by direct software manipula tion of the pcsw (using the writepcsw operation). the bits have the meanings shown in table 3-2 . the fp exception trap enable bits determine which fp exception flags invoke cpu exception handling. an ex- ception is requested if the intersection of the exception flags and trap enable flags is non-zero. the acceptance and handling of exceptions is described in section 3.5, ?special event handling.? bsx (bytesex). the dspcpu has a switchable bytesex. the bsx flag in the pcsw ca n be written by software. load/store operations observe little- or big-endian byte ordering based on the current setting of bsx. ien (interrupt enable). the ien flag disables or enables interrupt processing for most interrupt sources. only nmi (non-maskable interrupt) bypasses ien. the acceptance and handling of interrupts is described in section 3.5.3, ?int and nmi (maskable and non-maskable interrupts).? mse cs ien bsx ieee mode ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 misaligned store exception count stalls (1 ? yes) fp exception trap-enable bits ieee rounding mode 0 ? to nearest, 1 ? to zero, 2 ? to positive, 3 ? to negative interrupt enable (1 ? allow interrupts) byte sex (1 ? little endian) pcsw[31:16] pcsw[15:0] undef misaligned store exception trap enable trap on first exit fp exceptions trp mse tfe trp ofz trp ifz trp inv trp ovf trp unf trp inx trp dbz 16 17 18 19 20 21 22 23 25 26 27 28 30 31 undef undefined 13 wbe rse write back error reserved exception trp wbe trp rse write back error trap enable reserved exception trap enable 29 pcsw = 0x800 after reset figure 3-2. pnx1300 pcsw (program cont rol and status word) register format.
philips semiconductors dspcpu architecture preliminary specification 3-3 cs (count stalls). the cs flag determines the mode of cccount, the 64-bit clock cycle counter. if cs = ?1?, the cycle counter increments on all clock cycles. if cs = ?0?, the clock cycle counter only increments on non-stall cy- cles. see also section 3.1.5, ?ccc ount?clock cycle counter.? after reset, cs is set to ?1?. mse and trpmse (misaligned-store exception). the mse bit will be set when the processor detects a store operation to an address that is not aligned. for example, a 32-bit store executed with an address that is not a mul- tiple of four will cause mse to be set. the trpmse bit enables the dspcpu to raise misaligned address ex- ceptions. an exception is req uested if the intersection of mse and trpmse is non-zero. the acceptance and handling of exceptions is described in section 3.5, ?spe- cial event handling.? unaligned load operations do not cause an exception, because load operations can be speculative (i.e. their re- sult is thrown away). when the dspcpu generates an unaligned address, the low order address bit(s) (one bit in the case of a 16-bit load, two bits for a 32-bit load) are forced to zero and the load/store is executed from this aligned address. wbe and trpwbe (write back error). the wbe flag will be set whenever a progra m attempts to write back more than 5 results simultaneou sly. this is indicative of a programming error, likely caused by the scheduler or assembler. the trpwbe bit enables the corresponding exception. rse, trprse (reserved exception). rse and tr- prse are reserved for diagnostic purposes and not de- scribed here. tfe (trap on first exit). the tfe bit is a support bit for the debugger. the tfe bit is set by the debugger prior to taking a (non-interruptible) jump to the application pro- gram. on the next interruptible jump (the first interrupt- ible jump in the application being debugged), an excep- tion is requested because the tfe bit is set. the acceptance and handling of exception processing is de- scribed in section 3.5, ?special event handling.? it is the responsibility of the exceptio n handler software to clear the tfe bit. the hardware does not clear or set tfe. corner-case note: whenever a hardware update (e.g. an exception being raised) and a software update (through writepcsw) of the pcsw coin cide, the new value of the pcsw will be the value that is written by the writepcsw instruction, except for those bi ts that the hardware is cur- rently updating (which will reflect the hardware value). 3.1.4 spc and dpc?source and destination program counter the spc and dpc registers are support registers for ex- ception processing. the dpc is updated during every in- terruptible jump with the target address of that interrupt- ible jump. if an exception is taken at an interruptible jump, the value in the dpc register can be used by the exception handling routine as the return address to re- sume the program at the place of interruption. the spc register is updated during every interruptible jump that is not interrupted by an exception. thus on an interrupted interruptible jump, the spc register is not up- dated. the spc register allows the exception handling routine to determine the star t address of the decision tree (a block of uninterruptible, scheduled pnx1300 code) that was executing when th e exception was taken (see also section 3.5, ?special event handling? ). corner-case note: whenever a hardware update (during an interruptible jump) and a software update (through writedpc or writespc) coinci de, the software update takes precedence. 3.1.5 cccount?clock cycle counter cccount is a 64-bit counte r that counts clock cycles since reset. cycle counting can occur in two modes, depending on pcsw.cs. if pcsw.cs = ?1?, the cycle count increments on every cp u clock cycle. if pcsw.cs = ?0?, the clock cycle count only increments on non-stall cpu cycles. cccount is implemented as a master counter/slave register pair. the master 64-bit counter gets updated continuously. the value of the cccount slave register is updated with the current master cycle count during successful interruptible jumps only. the cycles and hicy- cles dspcpu operations return the content of the 32 lsbs and 32 msbs, respectively, of the slave register. this ensures that the value returned by hicycles and cy- cles is coherent, as long as there is no intervening inter- ruptible jump, which makes th ese operations suitable for 64-bit high resolution timing from c source code pro- grams. the curcycles dspcpu operation returns the 32 lsbs of the master counter. the latter operation can be used for instruction cycle precise timing. when used, it must be precisely placed, pr obably at the assembly code level. 3.1.6 boolean representation the bit pattern generated by boolean valued operations (ileq, fleq etc.) is '00...00' (false) or '00...01' (true). when interpreting a bit pattern as a boolean value, only the lsb is taken into account, i.e. 'xx..x0' is interpreted as false and 'xx..x1' is interpreted as true. in partic- ular, wherever a general purp ose register is used as a ?guard?, the lsb determines whether execution of the guarded operation takes place. table 3-2. pcsw fp exception flag definitions flag function inv standard ieee invalid flag ovf standard ieee overflow flag unf standard ieee underflow flag inx standard ieee inexact flag dbz standard ieee divide-by-zero flag ofz ?output flushed to zero? set if an operation caused a denormalized result ifz ?input flushed to zero? set if an operation was applied to one or more denormalized operands
pnx1300/01/02/11 data book philips semiconductors 3-4 preliminary specification 3.1.7 integer representation the architecture supports the notion of 'unsigned inte- gers' and 'signed integers.' signed integers use the stan- dard two?s-complement representation. arithmetic on integers does not generate traps. if a result is not representable, the bit pattern returned is operation specific, as defined in the individual operation description section. the typical cases are: ? wrap around for regular add- and subtract-type oper- ations. ? clamping against the minimum or maximum repre- sentable value for dsp-type operations. ? returning the least significant 32-bit value of a 64-bit result (e.g., integer /unsigned multiply). 3.1.8 floating point representation the pnx1300 architecture supports single precision (32- bit) ieee-754 floatin g point arithmetic. all arithmetic conforms to the ieee-754 standard in flush-to-zero mode. all floating point compute operations round according to the current setting of the pcsw ieee mode field. the current setting of the field determines result rounding (to nearest, to zero, to positive infinity, to negative infinity). conversions from float to integer/unsigned are available in two forms: a pcsw rounding-mode-observing form and an ansi-c-specific-rounding form. the ansi-c- specific form forces round to zero regardless of the pcsw ieee rounding mode. conversion from integer/ unsigned to float always observes the ieee rounding mode. floating point exceptions are supported with two mecha- nisms. each individual floating point operation (e.g. fadd) has a counterpart operation (faddflags) that computes the exception flag values. these operations can be used for precise exception identification 1 . the second mecha- nism uses the ?sticky? exce ption bits in the pcsw that collect aggregate exception events. the pcsw excep- tion bits can selectively invoke cpu exception handling. see section 3.5.2, ?exc (exceptions).? table 3-3 shows the representation choices that were made in pnx1300?s floating point implementation. 3.1.9 addressing modes the addressing modes shown in table 3-4 are support- ed by the dspcpu architecture (store operations allow only displacement mode). in these addressing modes, r[i] indicates one of the gen- eral purpose registers. the sc ale factor applied (1/2/4) is equal to the size of the item loaded or stored, i.e. 1 for a byte operation, two for a 16- bit operation and four for a 32-bit operation. the range of valid 'i', 'j' and 'k' values may differ between implementations of the architecture; the minimum values for impl ementation-dependent char- acteristics are shown in table 3-5 . note that the assembly code specifies the true displace- ment, and not the value to be scaled. for example, ?ld32d(?8) r3? loads a 32-bit value from address (r3 ? 8). this is encoded in the binary operation pattern as a ?2 in the seven-bit field by the assembler. at runtime, the scale factor four is applied to reconstruct the intended displacement of ?8. 3.1.10 software compatibility the dspcpu architecture expressly does not support binary compatibilit y between family members. the ansi c compiler ensures that all family members are compat- ible at the source-code level. 1. this mechanism allows prec ise exception identification in the context of our mult i-issue microprocessor core? where many floating point operations may issue simul- taneously?at the expense of additional operations generated by the compiler. it al so allows the compiler to issue compute operations speculatively and compute exceptions precisely. table 3-3. special float value representation item representation +inf 0x7f800000 -inf 0xff800000 self generated qnan 0xffffffff result of operation on any nan argu- ment argument | 0x00400000 (forcing the nan to be quiet) signalling nan never generated by pnx1300, accepted as per ieee-754 table 3-4. addressing modes mode suffix applies to name r[i] + scaled(#j) d load & store displacement r[i] + r[k] r load only index r[i] + scaled(r[k]) x load only scaled index table 3-5. minimum values for implementation- dependent addressing mode components parameter minimum range ?i? and ?k? 0..127 (i.e., each im plementation has at least 128 registers) ?j? -64..63 (i.e., displacement s will be at least 7 bits long and signed)
philips semiconductors dspcpu architecture preliminary specification 3-5 3.2 instruction set overview 3.2.1 guarding (conditional execution) in the pnx1300 architecture, all operations can be op- tionally 'guarded'. a guarded operation executes condi- tionally, depending on the value in the ?guard' register. for example, a guarded add is written as: if r23 iadd r14 r10 r13 this should be taken to mean if r23 then r13 r14 + r10. the ?if r23' clause controls the execution of the opera- tion based on the lsb of r23. hence, depending on the lsb of r23, r13 is either unchanged or set to contain the integer sum of r14 and r10. guarding applies to all dspcpu operations, except iimm and uimm (load-immediate). it controls the effect on all programmer-visible states of the system, i.e. register val- ues, memory content, exception raising and device state. 3.2.2 load and store operations memory is byte addressable. loads and stores must be ?naturally aligned?, i.e. a 16-bit load or store must target an address that is a multiple of 2. a 32-bit load or store must target an address that is a multiple of 4. the bsx bit in the pcsw determines the byte order of loads and stores. for example, see ld32 and st32 in appendix a, ?pnx1300/01/02/11 dspcpu operations.? only 32-bit load and store operations are allowed to ac- cess mmio registers in the mmio address aperture (see section 3.4, ?memory and mmio? ). the results are unde- fined for other loads and stores. a load from a non-exis- tent mmio register returns an undefined result. a store to a non-existent mmio register times out and then does not happen. there are no other side effects of an access to a nonexistent mmio regist er. the state of the bsx bit has no effect on the result of mmio accesses. loads are allowed to be issued speculatively. loads out- side the range of valid data memory addresses for the active process return an implementation-dependent val- ue and do not generate an exception. misaligned loads also return an implementation dependent value and do not generate an exception. if a pair of memory operations involves one or more com- mon bytes in memory, the effect on the common bytes is as defined in table 3-6 . table 3-4 shows the supported addressing modes. the minimum values of implementation-dependent address- ing-mode components are shown in table 3-5 . note: the index and scaled-index modes are not allowed with store opcodes, due to the hardware restriction that each operat ion have at most 2 source operand registers and 1 condition register. stores use 1 operand register for the value to be stored leaving only 1 register to form an address. the scale factor applied (1/2/4) in the scaled addressing modes is equal to the size of the item loaded or stored, i.e. 1 for a byte operation, 2 for a 16-bit operation and 4 for a 32-bit operation. table 3-7 lists the available load and store mnemonics for the three addressing modes. example usage of load and store operations: if r10 ild16d(12) r12 r13 if the lsb of r10 is set, load 16 bits starting at address (r12+12) using the byte ordering indicated in pcsw.bsx, sign-extend the value to 32 bits and store the result in r13. if r10 st32d(40) r12 r13 if the lsb of r10 is set, store the 32-bit value from r13 to the address (r12+40) using the byte ordering indicated in pcsw.bsx. table 3-6. behavior of loads and stores with coincident addresses condition behavior t store < t load if a store is issued before a load, the value loaded contains the new bytes. t load < t store if a load is issued befor e a store, the value loaded contains the old bytes. t store1 < t store2 if store1 is issued befor e store2, the result- ing value contains the bytes of store2. t store = t load if a load and store are issued in the same clock cycle, the result is undefined. t store1 = t store2 if two stores are issued in the same clock cycle, the resulting stored value is unde- fined. table 3-7. load and store mnemonics operation displacement index scaled- index 8-bit signed load ild8d ild8r ? 8-bit unsigned l oad uld8d uld8r ? 16-bit signed load ild16d ild16r ild16x 16-bit unsigned l oad uld16d uld16r uld16x 32-bit load ld32d ld32r ld32x 8-bit store st8d ? ? 16-bit store st16d ? ? 32-bit store st32d ? ?
pnx1300/01/02/11 data book philips semiconductors 3-6 preliminary specification 3.2.3 compute operations compute operations are register-to-register operations. the specified operation is performed on one or two source registers and the result is written to the destina- tion register. immediate operations. immediate operations load an immediate constant (specified in the opcode) and pro- duce a result in the destination register. floating-point compute operations. floating-point compute operations are regist er-to-register operations. the specified operation is performed on one or two source registers and the result is written to the destina- tion register. unless otherwise mentioned all floating point operations observe the rounding mode bits defined in the pcsw register. all floating-point operations not ending in ?flags? update th e pcsw exception flags. all operations ending in ?flags? compute the exception flags as if the operation were exec uted and return the flag val- ues (in the same format as in the pcsw); the exception flags in the pcsw itself remain unchanged. multimedia operations. these special compute opera- tions are like normal compute operations, but the speci- fied operations are not usually found in general purpose cpus. these operations provide special support for mul- timedia applications. 3.2.4 special-register operations special register operations operate on the special regis- ters: pcsw, dpc, spc and cccount. 3.2.5 control-flow operations control-flow operations change the value of the program counter. conditional jumps test the value in a register and, based on this value, change the program counter to the address contained in a second register or continue execution with the next instru ction. unconditional jumps always change the program counter to the specified im- mediate address. control-flow operations can be interruptible or non-inter- ruptible. execution of an interruptible jump is the only oc- casion where pnx1300 allows special event handling to take place (see section 3.5, ?special event handling? ). 3.3 pnx1300 instruction issue rules the pnx1300 vliw cpu allows issue of 5 operations in each clock cycle according to a set of specific issue rules. the issue rules impose issue time constraints and a result writeback constraint . any set of operations that meets all constraints constitutes a legal pnx1300 in- struction. a more extensive description and a few special case issue rules and limitations can be found in the phil- ips trimedia sde documentation. issue time constraints: ? an operation implies a need for a functional unit type (as documented in appendix a, ?pnx1300/01/02/11 dspcpu operations.? ) ? each operation requires an issue slot that has an instance of the appropriate functional unit type attached falu dspmul dspmul falu dmemspec shifter shifter fcomp dmem dmem branch branch branch ifmul ifmul dspalu ftough (latency 17, recovery 16) dspalu alu alu alu alu alu const const const const const issue slot 1 issue slot 2 issue slot 3 issue slot 4 issue slot 5 figure 3-3. pnx1300 issue slots, functional units, and latency.
philips semiconductors dspcpu architecture preliminary specification 3-7 ? functional units should be ?recovered? from any prior operation issues writeback constraint: ? no more than 5 results should be simultaneously written to the register file at any point in time (write- back occurs ?latency? cycles after issue) figure 3-3 shows all functional un its of pnx1300, includ- ing the relation to issue slot s, and each functional unit?s latency (e.g. 1 for const, 3 for falu, etc.). with the ex- ception of ftough, each func tional unit can accept an operation every clock cycle, i.e. has a recovery time of 1. the binding of operations to functional unit types is sum- marized in table 3-8 . in appendix a, ?pnx1300/01/02/ 11 dspcpu operations? , each operation lists the pre- cise functional unit and unit latency. 3.4 memory and mmio pnx1300 defines four apertures in its 32-bit address space: the memory hole, th e dram aperture, the mmio aperture and the pci apertures (see figure 3-4 ).the memory hole covers addresses 0..0xff. the dram and mmio apertures are defined by the values in mmio reg- isters; the pci apertures consist of every address that does not fall in the other three apertures. 3.4.1 memory map dram is mapped into an aperture extending from the address in dram_base to the address in dram_limit. the maximum dra m aperture size is 64 mb. the mmio aperture is located at address mmio_base and is a fixed 2-mb size. in the default operating mo de, all memory accesses not going to either the hole, dram or mmio space are inter- preted as pci accesses. this behavior can be overrid- den as described in section 5.3.8, ?memory hole and pci aperture disable.? the mmio aperture and the dram aperture can be at any naturally aligned location, in any order, but should not overlap; if they do, t he consequences are undefined. the values of dram_b ase, dram_limit, and mmio_base are set during the boot process. in the case of a pci host assisted boot, the values are deter- mined by the host bios. in ca se of standalone boot (i.e., pnx1300 is the pci host), the values are taken from the boot rom. refer to chapter 13, ?system boot? for de- tails. dspcpu update of dram_base and mmio_base is possible, but not recommended, see section 11.6.3, ?mmio/dram_base updates.? 3.4.2 the memory hole the memory hole from address 0 to 0xff serves to protect the system from performance loss due to speculative loads. due to the nature of c program references, most speculative loads issued by the dspcpu fall in the range covered by the hole. activated by default upon re- set, the hole serves to ensu re that these speculative loads do not cause pci re ad accesses and slow down the system. the value returned by any data load from the hole is 0. the hole only pr otects loads. store operations in the hole do cause writes to pci, sdram or mmio as determined by the aperture base address values. if the sdram aperture overlaps the memory hole, the memory hole is ignored. the hole can be temporarily disabled through the dc_lock_ctl register. this is described in section 5.3.8, ?memory hole and pci aperture disable.? 3.4.3 mmio memory map devices are controlled through memory-mapped device registers, referred to as mmi o registers. to ensure com- patibility with future devi ces, any undef ined mmio bits should be ignored when read , and written as ?0?s. some devices can autonomously access data memory (dma) and most devices can cause cpu interrupts. the 2-mb mmio aperture is initially located at address 0xefe00000 on reset; it is relocated by the pci bios table 3-8. functional unit operations unit type operation category const immediate operations alu 32-bit arithmetic, logical, pack/unpack dspalu dual 16-bit, quad 8-bi t multimedia arithmetic dspmul dual 16-bit and quad 8- bit multimedia multiplies dmem loads/stores dmemspec cache coherency, cache control, prefetch shifter multi-bit shift branch control flow falu floating point arithmetic & conversions ifmul 32-bit integer and fl oating point multiplies fcomp single cycle fl oating point compares ftough iterative floating point square root and division hole 256byte 0x0000 0000 pci mmio_base mmio aperture dram_limit dram_base dram aperture 0xffff fffff pci 2 mb 1 mb - 64 mb pci figure 3-4. pnx1 300 memory map.
pnx1300/01/02/11 data book philips semiconductors 3-8 preliminary specification for pc-hosted pnx1300 boards; its final location is de- termined by the boot eepr om for standalone systems. see chapter 13, ?system boot? for more information. figure 3-5 gives a detailed overview of the mmio mem- ory map (addresses used are offsets with respect to the mmio base). the operating system on pnx1300 can change mmio_base by writ ing to the mmio_base mmio location. user programs should not attempt this. refer to the trimedia sde reference manual for the standard method to access the device registers from c language device drivers. only 32-bit load and store operations are allowed to ac- cess mmio registers in the mmio address aperture. the results are undefined for ot her loads and stores. reads from non-existent mmio regi sters return undefined val- ues. writes to nonexistent mmio registers time out. there are no side effects of accesses to nonexistent mmio registers. the state of the pcsw bsx bit has no effect on the result of mmio accesses. the icache tag and lru bit access aperture give the dspcpu read-only access to the icache status. refer to section 5.4.8, ?reading tags and cache status? for de- tails. the excvec mmio location is explained in section 3.5.2, ?exc (exceptions).? section 3.5.3, ?int and nmi (maskable and non-maskable interrupts),? describes the locations that deal with the setup and handling of in- terrupts: isetting, ipen ding, iclear, imask and the interrupt vectors. the timer mmio locations are de- scribed in section 3.8, ?timers.? the instruction and data breakpoint are described in section 3.9, ?debug support.? the mmio locations of each device are treat- ed in the respective device chapters. 3.5 special event handling the pnx1300 microprocessor responds to the special events shown in table 3-9 , ordered by priority. with the exception of reset , which is enabled at all times, the architecture of the dspcpu allows special event handling to begin only during an interruptible jump operation (ijmpt, ijmpf or ijmpi) that succeeds (i.e., is a taken jump). exc, nmi and int handling can be initiated during handling of an exc or an int, but only during suc- cessful interruptible jumps. 0x00 0000 reserved for future use reserved for future use 0x10 3800 jtag interface 0x10 3400 i 2 c interface 0x10 3000 pci interface 0x10 2c00 ssi interface 0x10 2800 vld coprocessor 0x10 2400 image coprocessor 0x10 2000 audio out 0x10 1c00 audio in 0x10 1800 video out 0x10 1400 video in 0x10 1000 debug support 0x10 0c00 timers 0x10 0800 vectored interrupt controller 0x10 0400 mmio base 0x10 0000 main memory, cache control 0x1f fffff 0x10 1200 data breakpoints 0x10 1000 instruction breakpoints 0x10 0c60 systimer 0x10 0c40 timer3 0x10 0c20 timer2 0x10 0c00 timer1 0x10 08fc intvec31 0x10 08f8 intvec30 0x10 0888 intvec2 0x10 0884 intvec1 0x10 0880 intvec0 0x10 0828 imask 0x10 0824 iclear 0x10 0820 ipending 0x10 081c isetting3 0x10 0818 isetting2 0x10 0814 isetting1 0x10 0810 isetting0 0x10 0800 excvec 0x10 0400 mmio_base 0x10 0004 dram_limit 0x10 0000 dram_base 0x01 0000 icache tags & lru (r/o) figure 3-5. memory map of mmio address spa ce (addresses are offset from mmio_base). table 3-9. special events and event vectors event vector reset (highest priority) vector to dram_base exc (all exceptions) vector to excvec (programmable) nmi, int (non-maskable interrupt, maskable interrupt) use the programmed vector (one of 32 vectors depend- ing on the interrupt source)
philips semiconductors dspcpu architecture preliminary specification 3-9 the instruction scheduler uses interruptible jumps exclu- sively for inter-decision tree jumps. hence, within a deci- sion tree, no special-event processing can be initiated. if a tree-to-tree jump is taken, special-event processing is allowed. since the only registers live at this point (i.e., that contain usef ul data) are the global registers allocat- ed by the ansi c compiler, on ly a subset of the registers needs to be preserved by the event handlers. refer to the trimedia sde reference m anual for details on which registers can be in use. th e dspcpu register state can be described by the contents of this subset of general purpose registers and the cont ents of the pcsw and the dpc value (the target of the inter-tree jump). the priority resolution mech anism built into the dspcpu hardware dispatches the hi ghest-priority, non-masked special-event request at the time of a successful inter- ruptible jump operation. in view of the simple, real-time- oriented nature of the mechanisms provided, only limited nesting of events should be allowed. 3.5.1 reset reset is the highest priority special event. it is asserted by external hardware or by the host cpu. pnx1300 will respond to it at any time. external hardware reset through th e tri_reset# pin initiates boot protocol execution as described in chapter 13, ?system boot.? this causes the current pc value to be lost and instruction execution to start from address dram_base. a pci host cpu can perform a pnx1300 dspcpu-only reset by an mmio write to the biu_ctl.sr and cr bits. such a reset does not cause a full boot, instead the dspcpu resumes exec ution from dram_base. 3.5.2 exc (exceptions) the dspcpu enters exc spec ial-event processing un- der the following conditions: 1. reset is de-asserted. 2. the intersection pcsw[15,6:0] & pcsw[31,22:16] is non-empty or pcsw.tfe is set. 3. a successful interrup tible jump is in the final jump ex- ecution stage. dspcpu hardware takes the following actions on the ini- tiation of exc processing: 1. dpc is assigned the intended destination address of the successful jump. 2. instruction processing starts at excvec. all other actions are the responsibility of the exc handler software. note that no other special event processing will take place until the handler decides to execute an inter- ruptible jump that succeeds. 3.5.3 int and nmi (maskable and non- maskable interrupts) the on-chip vectored interrupt controller (vic) provides 32 int request input hardware lines. the interrupt con- troller prioritizes and maps attention requests from sev- eral different peripherals onto successive int requests to the dspcpu. int special event processing will occur under the follow- ing conditions: 1. reset is de-asserted. 2. the intersection pcsw[15,6:0] & pcsw[31,22:16] is empty and pcsw.tfe is not set. 3. the intersection of ip ending and imask is non- empty. 4. the interrupt is at level nmi or pcsw.ien = 1. 5. a successful interrup tible jump is in the final jump ex- ecution stage. dspcpu hardware takes the following actions on the ini- tiation of nmi or int processing: 1. dpc gets assigned the intended destination address of the successful jump. 2. instruction processing starts at the appropriate inter- rupt vector. all other actions are the re sponsibility of the int handler software. note that no other special event processing will take place until the handler decides to execute an inter- ruptible jump that succeeds. 3.5.3.1 interrupt vectors each of the 32 interrupt sources can be assigned an ar- bitrary interrupt vector (the address of the first instruction of the interrupt handler). a vector is setup by writing the address to one of the mmio locations shown in figure 3-6 . the state of the mmio vector locations is un- defined after reset . (addresses of the mmio vector registers are offset wit h respect to mmio_base.) source 0 vector intvec0 (r/w) source 1 vector intvec1 (r/w) source 2 vector intvec2 (r/w) source 30 vector intvec30 (r/w) source 31 vector intvec31 (r/w) ? ? ? ? ? ? 0x10 0880 0x10 0884 0x10 0888 0x10 08f8 0x10 08fc ? ? ? 31 0 mmio_base offset: figure 3-6. interrupt vector locations in mmio address space.
pnx1300/01/02/11 data book philips semiconductors 3-10 preliminary specification programmer?s note: see the philips trimedia cookbook (book 2 of trimedia sde documentation) for information on writing interrupt handlers. 3.5.3.2 interrupt modes dspcpu interrupt sources can be programmed to oper- ate in either level-sensitive or edge-triggered mode. op- eration in edge-triggered or level-sensitive mode is de- termined by a bit in the isetting mmio locations corresponding to the source, as defined in figure 3-7 . on reset, all isetting registers are cleared. in edge-triggered mode, the leading edge of the signal on the device interrupt request line causes the vic (vec- tored interrupt cont roller) to set the interrupt pending flag corresponding to the device source number. note that, for active high signals, the leading edge is the positive edge, whereas for active low request signals (such as pci inta#), the negative edge is the leading edge. the interrupt remains pending until one of two events occurs: ? the vic successfully disp atches the vector corre- sponding to the source to the pnx1300 cpu, or ? pnx1300 cpu software clears the interrupt-pending flag by a direct write to the iclear location. no interrupt acknowledge to iclear is needed for de- vices operating in edge-triggered mode, since the vector dispatch clears the ipending request. the device itself may however need a device-specific interrupt acknowl- edge to clear the requesting condition. edge-triggered mode is not recommended for devices that can signal multiple simultaneous interr upt conditions. the on-chip timers must be operated in edge triggered mode. in level-sensitive mode, the device requests an interrupt by asserting the vic source request line. the device holds the request until the device interrupt handler per- forms a device interrupt acknowledge. it is highly recom- mended that all off-chip and on-chip sources, with the ex- ception of the timers, operat e in level-sensitive mode. 3.5.3.3 device interrupt acknowledge all devices capable of generating level-triggered inter- rupts have interrupt acknowledge bits in their memory mapped control registers for this purpose. an interrupt acknowledge is performed by a store to such control reg- ister, with a ?1? in the bit po sition(s) corresponding to the desired acknowledge flags. programmers note: the store operation that performs the interrupt acknowledge should be issued at least 2 cycles before the (interruptible) jump that ends an interrupt han- dler. this ensures that the same interrupt is not dis- patched twice due to request de-assertion clock delays. 3.5.3.4 interrupt priorities each interrupt source can be programmed to request one out of eight levels of priorities. the highest priority level (level 7) corresponds to requesting an nmi?an in- terrupt that cannot be masked by the dspcpu pc- sw.ien bit. the other levels request regular interrupts, that can be masked as a group by the pcsw.ien flag. level six represents the highes t priority normal interrupt level and level zero repres ents the lowest. refer to figure 3-7 for details of programming the priority level. the vic arbitrates the highest-priority pending interrupt requestor. sources programmed to request at the same level are treated with a fixed priority, from source number 0 (highest) to 31 (lowest). at such time as the dspcpu is willing to process special ev ents, the vector of highest priority nmi source will be disp atched. if no nmi is pend- ing, and the dspcpu allows regular interrupts (pc- sw.ien is asserted), the vect or of the highest priority regular source is dispatched. once a vector is dis- patched, the corresponding interrupt pending flag is de- asserted (edge triggered mode sources only). 3.5.3.5 interrupt masking a single mmio register (imask in figure 3-8 ) allows masking of an arbitrary subset of the interrupt sources. masking applies to both regu lar as well as nmi level re- questors. masking is used by software to disable unused devices and/or to implement nested interrupt handling. in the latter case, each interrupt handler can stack the old imask content for later rest oration and insert a new mask that only allows the inte rrupts it is willing to handle. for level-triggered device handlers, imask should also exclude the device itself to prevent repeated handler ac- tivation. each interrupt source device typically has its own inter- rupt enable flag(s) that determine whether certain key mp31 isetting3 (r/w) 0x10 081c 31 0 mmio_base offset: isetting2 (r/w) 0x10 0818 isetting1 (r/w) 0x10 0814 isetting0 (r/w) 0x10 0810 mp30 mp29 mp28 mp27 mp26 mp25 mp24 3 7 11 15 19 23 27 each mp field: 0xxx source operates in edge-triggered mode 1xxx source operates in level-sensitive mode each mp field: x111 nmi (highest) priority x110 maskable level 6 ... x000 maskable level 0 mp23 mp22 mp21 mp20 mp19 mp18 mp17 mp16 mp15 mp14 mp13 mp12 mp11 mp10 mp9 mp8 mp7 mp6 mp5 mp4 mp3 mp2 mp1 mp0 figure 3-7. interrupt mode and priority mmio locations and formats.
philips semiconductors dspcpu architecture preliminary specification 3-11 device events lead to the request of an interrupt. in addi- tion, the pcsw.ien flag determines whether the dspcpu is willing to handle regular interrupts. non maskable interrupts ignore the state of this flag. all three mechanisms are necessary: the pcsw.ien flag is used to implement critic al sections of code during which the rtos (real-time operating system) is unable to handle regular interrupts . the imask is used to allow full control over interrupt handler nesting. the device in- terrupt flags set the operational mode of the device. when reset is asserted , ipending, iclear, and imask are set to all zeroes. (mmio register addresses shown in figure 3-8 are offset addresses with respect to mmio_base.) 3.5.3.6 software interrupts and acknowledgment the ipending register shown in figure 3-8 can be read to observe the currently pend ing interrupts. each bit read depends on the mode of the source: ? for a level-sensitive source, a bit value corresponds to the current state of the device interrupt request line. ? for an edge-triggered interrupt, a ?1? is read if and only if an interrupt request occurred and the corre- sponding vector has not yet been dispatched. software can request an interrupt for sources operating in edge-triggered mode. writ es to the ipending register assert an interrupt request for all sources where a 1 oc- curred in the bit position of the written value. the state of sources where a 0 occurred in the written value is un- changed. writes have no effe ct on level-sensitive mode sources. the interr upt request, if not masked, will occur at the next successful interrupt ible jump. this differs from the conventional software interrupt-like semantics of many architectures. any of the 32 sources can be re- quested in software. in normal operation however, soft- ware-requested interr upts should be limited to source vectors not allocated for hardware devices. note that an- other pci master can request interrupts by manipulating the ipending location in the mmio aperture. this is useful for inter-processor communication. the iclear register reads the same as the ipending register. writes to the icl ear register serve to clear pending flags for edge-triggered mode sources. all ip- ending flags corresponding to bit positions in which ?1?s are written are cleared. ip ending flags corresponding to bit positions in which ?0?s are written are not affected. writes have no effect on le vel-sensitive mode sources. when a pending interrupt bit is being cleared through a write to the iclear register at the same time that the hardware is trying to set t hat interrupt bi t, the hardware takes precedence. 3.5.3.7 nmi sequentialization in most applications, it is desirable not to nest nmis. the nmi interrupt handler can acco mplish this by saving the old imask content and cleari ng imask before the first interruptible jump is executed by the nmi handler. 3.5.3.8 interrupt source assignment table 3-10 shows the assignment of devices to interrupt source numbers, as well as the recommended operating mode (edge or level triggered). note that there are a total of 5 external pins available to assert interrupt requests. the pci inta to intd requests are asserted by active low signal conventions, i.e. a zero level or a negative edge asserts a request. the userirq pin operates with active high signalling conventions. 3.6 pnx1300 to host interrupts in systems where pnx1300 is operating in the presence of a host cpu on pci, pnx1300 can generate interrupts to the host, using any combin ation of the four pci inta# to intd# pins. in a typical ho st system, only one of these pins needs to be wired to the pci bus interrupt request lines. any unused pins of this group are then available for use as software programmable i/o pins. the int_ctl register (see figure 3-9 ) iex bits, when set, enable the open collector driver of the four intd#..inta# pins. the intx bits determine the output value generated (if enabled). a ?1? in intx causes the corresponding pci interrupt pin to be asserted (low in- tx# pin). the isx bits are read-only and reflect the cur- imask (r/w) 0x10 0828 31 0 mmio_base offset: 7 23 15 iclear (r/w) 0x10 0824 ipending (r/w) 0x10 0820 each imask(i ) bit: on read or write, 0 ? disallow source i interrupt request on read or write, 1 ? allow source i interrupt request each iclear(i) bit: on read, same as ipending(i) on write, 1 ? clear source i interrupt request each ipending(i) bit: on read, 1 ? source i interrupt request is pending on write, 1 ? software source i interrupt request figure 3-8. interrupt controller reque st, clear, and mask mmio registers.
pnx1300/01/02/11 data book philips semiconductors 3-12 preliminary specification rent actual state of the pins. note that the pins have neg- ative logic (active low) polarity and are of the open collector output type. hence the pin voltage is low (ac- tive) when the logical value set or seen in the int_ctl register is a ?1?. the assertion and de-assertion of host interrupts is the responsibility of pnx1300 software. see also section 11.6.17, ?int_ctl register.? 3.7 host to pnx1300 interrupts a host cpu can generate an interrupt to pnx1300 in several ways: ? by a pci mmio write to ipending to assert the hostcomm interr upt (bit 28) ? by a hardware circuit that asserts one of the interrupt request pins tri_userirq, or inta..intd. the first and most common method requires no circuitry and leaves the interrupt pins available for other purposes. 3.8 timers the dspcpu contains four programmable timer/ counters, all with the same function. the first three (timer1, timer2, timer3) are intended for general use. the fourth timer/coun ter (systimer) is reserved for use by the system software and should not be used by applications. each timer has three registers as shown in figure 3-10 . the mmio register addresses shown are offset address- es with respect to the timer?s base address. each timer/counter can be set to count one of the event types specified in table 3-12 . note that the databreak event is special, in that the timer/counter may increment by zero, one or two in each clock cycle. for all other event types, increments are by zero or one. the cache1 and cache2 events serve as cache per- formance monitoring support. the actual event selected for cache1 and cache2 is determined by the mem_events mmio register, see section 5.7, ?perfor- mance evaluation support.? if a pnx1300 pin signal (vi- clk, etc.) is selected as an event, positive-going edges on the signal are counted. each timer increments its value until the modulus is reached. on the clock cycle where the incremented val- ue would equal or exceed the modulus, the value wraps around to zero or one (in the case of an increment by two), and an interrupt is generated as defined in table 3-10 . the timer interrupt source mode should be set as edge-sensitive. no software interrupt acknowl- edge to the timer device is necessary. counting starts and continues as long as the run bit is set. loading a new modulus does not affect the contents of the value register. if a store operation to either the mod- ulus or value register results in value and modulus being the same, no interrupt will be generated. if the run bit is set, the next value will be mo dulus+1 or modulus+2, and table 3-10. interrupt source assignments source name src num mode source description pci inta 0 level pci_inta# pin signal pci intb 1 level pci_intb# pin signal pci intc 2 level pci_intc# pin signal pci intd 3 level pci_intd# pin signal tri_userirq 4 either external general-purpose pin timer1 5 edge general-purpose timer timer2 6 edge general-purpose timer timer3 7 edge general-purpose timer systimer 8 edge reserved for debugger videoin 9 level video in block videoout 10 level video out block audioin 11 level audio in block audioout 12 level audio out block icp 13 level image coprocessor vld 14 level vld coprocessor ssi 15 level ssi interface pci 16 level pci biu (dma, etc.; see table 11-14 for possible interrupt causes) iic 17 level i 2 c interface jtag 18 level jtag interface t.b.d. 19..24 reserved for future devices spdo 25 level spdo block t.b.d. 26..27 reserved for future devices hostcom 28 edge (software) host communica- tion app 29 edge (software) application debugger 30 edge (software) debugger rtos 31 edge (software) rtos figure 3-9. host interrupt control register 31 0 mmio_base offset: 0x10 3038 3 7 11 15 19 23 27 int_ctl (r/w) is[d:a] ie[d:a] int[d:a]
philips semiconductors dspcpu architecture preliminary specification 3-13 the counter will have to loop around before an interrupt is generated. a modulus value of zero causes a wrap-around as if the modulus value was 2 32 . on reset, the tctl registers are cleared, and the val- ue of the tmodulus and tvalue registers is unde- fined. 3.9 debug support this section describes the special debug support offered by the dspcpu. instruction and data breakpoints can be defined through a set of registers in the mmio register space. when a breakpoint is matched, an event is gen- erated that can be used as a timer source (see section 3.8, ?timers? ). the timer tmodulus has to be set to generate a dspcpu interrupt after the desired number of breakpoint matches. 3.9.1 instruction breakpoints the instruction-breakpoint control register is shown in figure 3-11 . on reset, the bictl register is cleared. (mmio-register addresses shown are offset with respect to mmio_base.) the instruction-breakpoint address-range registers are shown in figure 3-12 . after reset, the value of these registers is undefined. (mmio- register addresses shown are offset with re spect to mmio_base.) when the ic bit in the breakpoint control register is set to ?1?, instruction brea kpoints are activated. any instruction address issued by the pnx1300 chip is compared against the low and high address-range values. the iac bit in the breakpoint control register determines whether the instruction address needs to be inside or outside of the range defined by the low and high address-range registers. a successful comp arison takes place when ei- ther: ? iac = ?0? and low iaddr high, or ? iac = ?1? and iaddr < low or iaddr > high. on a successful comparison, an instruction breakpoint event is generated, which ca n be used as a clock input to a timer. after counting the programmed number of in- struction breakpoint events, the timer will generate an in- terrupt request. table 3-11. timer base mmio address timer1 mmio_base+0x10,0c00 timer2 mmio_base+0x10,0c20 timer3 mmio_base+0x10,0c40 systimer mmio_base+0x10,0c60 table 3-12. timer source selections source name source bits value source description clock 0 cpu clock prescale 1 prescaled cpu clock tri_timer_clk 2 external clock pin databreak 3 data breakpoints instbreak 4 instruction breakpoints cache1 5 cache event 1 cache2 6 cache event 2 vi_clk 7 video in clock pin vo_clk 8 video out clock pin ai_ws 9 audio in word strobe pin ao_ws 10 audio out word strobe pin ssi_rxfsx 11 ssi receive frame sync pin ssi_io2 12 ssi transmit frame sync pin ? 13-15 undefined modulus tmodulus (r/w) 0 31 0 timer base offset: tvalue (r/w) 4 tctl (r/w) 8 3 7 11 15 19 23 27 ?prescale?: prescale value is 2^prescale, i.e., in the range [1..32768] ?source? select: see table table 3-12 value prescale source ?run? bit: 0 timer stopped 1 timer running r figure 3-10. timer re gister definitions.
pnx1300/01/02/11 data book philips semiconductors 3-14 preliminary specification 3.9.2 data breakpoints the data-breakpoint address-range and compare-value registers are shown in figure 3-13 . after reset, the val- ue of the data breakpoint registers is undefined. (mmio- register addresses shown are offset with respect to mmio_base.) the data-breakpoint control register is shown in figure 3-14 . on reset, the bdctl register is cleared. (the register address shown is offset with respect to mmio_base.) when the dc bits in the data breakpoint control register are not set to ?0?, data breakpoints are activated. when the value of the dc bits is ?1? or ?3?, any data address from load operations (if the bl bit is set) and/or store opera- tions (if the bs bit is set) issued by the dspcpu is com- pared against the low and high address-range values. the dac bit in the breakpoint control register determines whether data addresses need to be inside or outside of the range defined by the low and high address-range registers. a successful comparison occurs when either: ? dac = ?0? and low daddr high, or ? dac = ?1? and daddr < low or daddr > high. 31 0 mmio_base offset: bictl (r/w) 0x10 1000 3 7 11 15 19 23 27 ?iac? instruction address control: 0 breakpoint if address inside range 1 breakpoint if address outside range ?ic? instruction control bit: 0 disable instruction breakpoints 1 enable instruct ion breakpoints ic figure 3-11. instruction-breakpoint control register. address range start binstlow (r/w) 0x10 1004 31 0 mmio_base offset: binsthigh (r/w) 0x10 1008 3 7 11 15 19 23 27 address range end figure 3-12. instruction-breakpoint address-range registers. bdataalow (r/w) 0x10 1030 31 0 mmio_base offset: bdataahigh (r/w) 0x10 1034 bdataval (r/w) 0x10 1038 bdatamask (r/w) 0x10 103c address range start 3 7 11 15 19 23 27 address range end data breakpoint value data breakpoint value mask figure 3-13. data-breakpoint address-range and value-compare registers. 31 0 mmio_base offset: bdctl (r/w) 0x10 1020 3 7 11 15 19 23 27 ?dvc? data value control: 0 breakpoint if data equal 1 breakpoint if data not equal dc bs bl ?bs? break on store: 0 don?t check data stores 1 do check data stores ?dac? data address control: 0 breakpoint if address inside range 1 breakpoint if address outside range ?bl? break on load: 0 don?t check data loads 1 do check data loads ?dc? data control: 0 no checking 1 check data addresses 2 check data values 3 check data value and addresses figure 3-14. data-breakpoint control register.
philips semiconductors dspcpu architecture preliminary specification 3-15 note that this comparison works for all addresses re- gardless of the aperture to which they belong. when the value of the dc bits is ?2? or ?3?, any data value from load operations (if the bl bit is set) and/or store operations (if the bs bit is set) issued by the pnx1300 cpu is com- pared against the value in the bdataval register. only the bits for which the co rresponding bdatamask regis- ter bits are set to ?1? will be used in the comparison. the dvc bit in the breakpoint control register determines whether the data value needs to be equal or not equal to the comparison value. a successful comparison occurs when either of the following are true: ? dvc = ?0? and (data & bdatamask) = (bdataval & bdatamask). ? dvc = ?1? and (data & bdatamask) != (bdataval & bdatamask). note: use a nonzero datamask or the result is undefined. when a successful comparison has taken place, a data breakpoint event is generated , which can be used as a clock input to a timer. af ter counting the set number of data breakpoint ev ents, the timer will generate an inter- rupt request. when the value of the dc bits is ?3?, a data breakpoint event is generated if and only if a successful comparison occurs on both address and data simultaneously. note that up to two data breakpoint events can occur per clock cycle, due to the dual load/store capability of the cpu and data cache.
pnx1300/01/02/11 data book philips semiconductors 3-16 preliminary specification
preliminary specification 4-1 custom operations for multimedia chapter 4 by gert slavenburg, pieter v.d. meulen, yong cho, sang-ju park 4.1 custom operations overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. custom operations in the pnx1300 dspcpu architec- ture are specialized, high-function operations designed to dramatically improve performance in important multi- media applications. when properly incorporated into ap- plication source code, custom operations enable an ap- plication to take advantage of the highly parallel pnx1300 microprocessor implementation. achieving a similar performance increase through other means? e.g., executing a higher number of traditional micropro- cessor instructions per cycle?would be prohibitively ex- pensive for pnx1300?s low-cost target applications. custom operations are simple to understand and consis- tent in their definition, but their unusual functions make it difficult for automatic code generation algorithms to use them effectively. consequently, custom operations are inserted into source code by the programmer. to make this process as painless as possible, custom operation syntax is consistent with the c programming language, and, just as with all other operations generated by the compiler, the schedul er takes care of register allocation, operation packing, and flow analysis. 4.1.1 custom operation motivation for both general-purpose and embedded microproces- sor-based applications, programming in a high-level lan- guage is desirable. to ef fectively support optimizing compilers and a simple programming model, certain mi- croprocessor architecture features are needed, such as a large, linear address space, general-purpose registers, and register-to-register oper ations that directly support the manipulation of linear address pointers. a common choice in microprocessor architectures is 32-bit linear addresses, 32-bit registers, and 32-bit integer opera- tions. pnx1300 is such a microprocessor architecture. for the data manipulation in many algorithms, however, 32-bit data and operations are wasteful of expensive sil- icon resources. important mu ltimedia applications, such as the decompression of m peg video streams, spend significant amounts of execution time dealing with eight- bit data items. using 32-bit operations to manipulate small data items makes inefficient use of 32-bit execution hardware in the implementation. if these 32-bit resources could be used instead to operate on four eight-bit data items simultaneously, performance would be improved by a significant fact or with only a tiny increase in imple- mentation cost. getting the highest executio n rate from standard micro- processor resources is one of the motivations behind custom operations in pnx1300. a range of custom oper- ations is provided that each processes?simultaneous- ly?four 8-bit or two 16-bit da ta items. there is little cost difference between a standard 32-bit alu and one that can process either one pair of 32-bit operands or four pairs of eight-bit operands, but there is a big perfor- mance difference for pnx1300 ?s target applications. pnx1300?s custom operations go beyond simply making the best use of standard resources. some custom oper- ations combine several simp le operations. these combi- nations are tailored specifically to the needs of important multimedia applications. some high-function custom op- erations eliminate conditiona l branches, which helps the scheduler make effective use of all five operation slots in each pnx1300 instruction. filli ng up all five slots is es- pecially important in the inner loops of computational in- tensive multimedia applications. in short, custom operations help pnx1300 reach its goals of extremely high multimedia performance at the lowest possible cost. 4.1.2 introduction to custom operations table 4-1 and table 4-2 contain two listi ngs of the cus- tom operations available in the pnx1300 architecture. table 4-1 groups the custom operations by type of func- tion while table 4-2 lists the operations by operand size. for more detailed information about the custom opera- tions, appendix a, ?pnx1300/01/02/11 dspcpu opera- tions.? some operations exist in several versions that differ in the treatment of their operan ds and results, and the mne- monics for these versions make it easy to select the ap- propriate operation. for ex ample, the sum of products operations all have ?fir? in their mnemonics; the prefix and suffix of the mnemonic expresses the treatment of the operands and result. the ifir8ii operation treats both of its operands as signed (ifir8ii ) and produces a signed result (i fir8ii). the ifir8iu operation treats its first operand as signed (ifir8i u), the second as unsigned (ifir8iu ), and produces a signed result (i fir8iu). the ume8ii operation implements an eight-bit motion -estimation; it treats both operands as signed but produces an unsigned result. the operations beginning with ?dsp? implement a clip- ping (sometimes called satura ting) function before stor-
pnx1300/01/02/11 data book philips semiconductors 4-2 preliminary specification ing the result(s) in the dest ination register. otherwise, their naming follows the rules given above where appro- priate. for example, the dspuquadaddui operation imple- ments four 8-bit additions; it treats the first operand of each addition as unsigned, the second operand as signed, and produces an unsi gned result for each addi- tion. each result, which is computed with no loss of pre- cision, is clipped into the representable range of a byte (0..255). table 4-1. key multimedia custom operations listed by function type function custom op description dsp absolute value dspiabs clipped signed 32-bit absolute value dspidualabs dual clipped absolute values of signed 16-bit halfwords shift dualasr dual-16 arithmetic shift right clip dualiclipi dual-16 clip signed to signed dualuclipi dual-16 clip signed to unsigned min,max quadumax unsigned bytewise quad max quadumin unsigned bytewise quad min dsp add dspiadd clipped signed 32-bit add dspuadd clipped unsigned 32-bit add dspidualadd dual clipped add of signed 16- bit halfwords dspuquadaddui quad clipped add of unsigned/ signed bytes dsp multiply dspimul clipped signed 32-bit multiply dspumul clipped unsigned 32-bit multi- ply dspidualmul dual clipped multiply of signed 16-bit halfwords dsp subtract dspisub clipped signed 32-bit subtract dspusub clipped unsigned 32-bit sub- tract dspidualsub dual clipped subtract of signed 16-bit halfwords sum of products ifir16 signed sum of products of signed 16-bit halfwords ifir8ii signed sum of products of signed bytes ifir8iu signed sum of products of signed/unsigned bytes ufir16 unsigned sum of products of unsigned 16-bit halfwords ufir8uu unsigned sum of products of unsigned bytes merge, pack mergedual16lsb merge dual- 16 least-significant bytes mergelsb merge least-significant bytes mergemsb merge most-significant bytes pack16lsb pack least-significant 16-bit halfwords pack16msb pack most-significant 16-bit halfwords packbytes pack least-significant bytes byte averages quadavg unsigned byte-wise quad aver- age byte multiplies quadumulmsb unsigned quad 8-bit multiply most significant motion estima- tion ume8ii unsigned sum of absolute val- ues of signed 8-bit differences ume8uu unsigned sum of absolute val- ues of unsigned 8-bit differ- ences table 4-2. key multimedia custom operations listed by operand size op. size custom op description 32-bit dspiabs clipped signed 32-bit abs value dspiadd clipped signed 32-bit add dspuadd clipped unsigned 32-bit add dspimul clipped signed 32-bit multiply dspumul clipped unsigned 32-bit multi- ply dspisub clipped signed 32-bit subtract dspusub clipped unsigned 32-bit sub- tract 16-bit mergedual16lsb merge dual -16 least-significant bytes dualasr dual-16 arithmetic shift right dualiclipi dual-16 clip signed to signed dualuclipi dual-16 clip signed to unsigned dspidualmul dual clipped multiply of signed 16-bit halfwords dspidualabs dual clipped absolute values of signed 16-bit halfwords dspidualadd dual clipped add of signed 16- bit halfwords dspidualsub dual clipped subtract of signed 16-bit halfwords ifir16 signed sum of products of signed 16-bit halfwords ufir16 unsigned sum of products of unsigned 16-bit halfwords pack16lsb pack least-significant 16-bit halfwords pack16msb pack most-significant 16-bit halfwords
philips semiconductors custom operations for multimedia preliminary specification 4-3 4.1.3 example uses of custom ops the next three sections illustra te the advantages of using custom operations. also, the more complex examples il- lustrate how custom operations can be integrated into application code by providing listings of c-language pro- gram fragments. the examples progress in complexity from simple to intricate; the most interesting examples are taken from actual multimedia codes, such as mpeg decompression. 4.2 example 1: byte-matrix transposition the goal of this example is to provide a simple, introduc- tory illustration of how custom operations c an significant- ly increase processing speed in small kernels of applica- tions. as in most uses of custom operations, the power of custom operations in this case comes from their ability to operate on multiple data items in parallel. imagine that our task is to transpose a packed, 4-by-4 matrix of bytes in memory; the matrix might, for example, contain 8-bit pixel values. figure 4-1 illustrates both the organization of the matrix in memory and the task to be performed in standard mathematical notation. performing this operation wit h traditional microprocessor instructions is straight forw ard but time consuming. one way to perform the manipulation is to perform 12 load- byte instructions (since only 12 of the 16 bytes need to be repositioned) and 12 store- byte instructions that place the bytes back in memory in their new positions. another way would be to perform four load-word instructions, re- position the bytes in registers, and then perform four store-word instructions. unfortunately, repositioning the bytes in registers would require a large number of in- structions to properly shift and mask the bytes. perform- ing the 24 loads and stores makes implicit use of the shifting and masking hardware in the load/store units and thus yields a shorter instruction sequence. the problem with performing 24 loads and stores is that loads and stores are inherently slow operations because they must access at least th e cache and possibly slower layers in the memory hierarchy. further, performing byte loads and stores when 32-bit word-wide accesses run just as fast wastes the power of the cache/memory inter- face. we would prefer a fast algorithm that takes full ad- vantage of cache/memory b andwidth while not requiring an inordinate number of byte-manipulation instructions. pnx1300 has instructions that merge and pack bytes and 16-bit halfwords directly and in parallel. four of these instructions can be applied in this case to speed up the manipulation of bytes that are packed into words. figure 4-2 shows the application of these instructions to the byte-matrix transposition problem, and the left side of figure 4-3 shows a list of the operations needed to im- plement the matrix transpose. when assembled into ac- tual pnx1300 instructions , these custom operations would be packed as tightly as dependencies allow, up to five operations per instruction. note that a programmer would not need to program at this level (pnx1300 assemb ler). the matrix transpose would be expressed just as efficiently in c-language source code, as shown on the right side of figure 4-3 . the low-level code is shown he re for illustration purpos- es only. the first sequence of four load-word operations in figure 4-3 brings the packed words of the input matrix into registers r10, r11, r12, and r13. the next se- quence of four merge operations produces intermediate results into registers r14, r15, r16, and r17. the next sequence of four pack operat ions could then replace the original operands or place the transposed matrix in sep- arate registers if the origi nal matrix operands were need- 8-bit quadumax unsigned bytewise quad max quadumin unsigned bytewise quad min dspuquadaddui quad clipped add of unsigned/ signed bytes ifir8ii signed sum of products of signed bytes ifir8iu signed sum of products of signed/unsigned bytes ufir8uu unsigned sum of products of unsigned bytes mergelsb merge least-significant bytes mergemsb merge most-significant bytes packbytes pack least-significant bytes quadavg unsigned byte-wise quad aver- age quadumulmsb unsigned quad 8-bit multiply most significant ume8ii unsigned sum of absolute val- ues of signed 8-bit differences ume8uu unsigned sum of absolute val- ues of unsigned 8-bit differ- ences table 4-2. key multimedia custom operations listed by operand size op. size custom op description 31 0 a e i m b f j n c g k o d h l p a b c d e f g h i j k l m n o p row major column major transpose a b c d e f g h i j k l m n o p 31 0 a e i m b f j n c g k o d h l p transpose n+0: n+4: n+8: n+12: memory location figure 4-1. byte-matrix transposition. top shows byte matrices packed in to memory words; bottom shows mathematical matrix representation.
pnx1300/01/02/11 data book philips semiconductors 4-4 preliminary specification ed for further computations (the pnx1300 optimizing c compiler performs this analysis automatically). in this ex- ample, the transpose matrix is placed in registers r18, r19, r20, and r21. the final four store-word operations put the transposed matr ix back into memory. thus, using the pnx1300 custom operations, the byte- matrix transposition requires four load-word operations and four store-word operations (the minimum possible) and eight register-to-register data-manipulation opera- tions. the result is 16 operations, or byte-matrix transpo- sition at the rate of one operation per byte. while the advantage of the custom-operation-based al- gorithm over the brute-force co de that uses 24 load- and store-byte instruction seems to be only eight operations (a 33% reduction), the advantage is actually much great- er. first, using custom operations, the number of memo- ry references is reduced from 24 to eight (a factor of three). since memory references are slower than regis- ter-to-register operations (s uch as the custom operations in this example), the reduct ion in memory references is significant. further, the ability of th e pnx1300 vliw compilation system to exploit the perf ormance potential of the pnx1300 microprocessor har dware is enhanced by the custom-operation-based code. this is because it is eas- ier for the compilation system to produce an optimal schedule (arrangement) of the code when the number of memory references is in balance with the number of reg- ister-to-register operations. the pnx1300 cpu (like all high-performance microprocessors) has a limit on the number of memory references that can be processed in a single cycle (two is the current limit). a long sequence of code that contains only memory references can result in empty operation slots in the long pnx1300 instruc- tions. empty operation slots waste the performance po- tential of the pnx1300 hardware. as this example has shown, careful use of custom oper- ations has the potential to not only reduce the absolute number of operations needed to perform a computation but can also help the compilation system produce code that fully exploits the pe rformance potential of the pnx1300 cpu. 4.3 example 2: mpeg image reconstruction the complete mpeg video decoding algorithm is com- posed of many different phases, each with computational intensive kernels. one important kernel deals with recon- structing a single image frame given that the forward- and backward-predicted frames and the inverse discrete cosine transform (idct) results have already been com- puted. this kernel provides an excellent opportunity to il- lustrate of the power of pnx1300?s specialized custom operators. in the code fragments that follow, the backward-predict- ed block is assumed to have been computed into an ar- ray back[], the forward-predicted block is assumed to have been computed into forward[], and the idct results are assumed to have been computed into idct[]. a e i m b f j n c g k o d h l p a b c d e f g h i j k l m n o p row major column major mergemsb mergemsb a e b f i m j n mergelsb mergelsb c g d h k o l p pack16msb pack16lsb pack16msb pack16lsb figure 4-2. application of merge and pack inst ructions to the byte-matrix transposition of figure 4-1 . ld32d(0) r100 r10 ld32d(4) r100 r11 ld32d(8) r100 r12 ld32d(12) r100 r13 mergemsb r10 r11 r14 mergemsb r12 r13 r15 mergelsb r10 r11 r16 mergelsb r12 r13 r17 pack16msb r14 r15 r18 pack16lsb r14 r15 r19 pack16msb r16 r17 r20 pack16lsb r16 r17 r21 st32d(0) r101 r18 st32d(4) r101 r19 st32d(8) r101 r20 st32d(12) r101 r21 char matrix[4][4]; . . . int *m = (int *) matrix; temp0 = mergemsb(m[0], m[1]); temp1 = mergemsb(m[2], m[3]); temp2 = mergelsb(m[0], m[1]); temp3 = mergelsb(m[2], m[3]); m[0] = pack16msb(temp0, temp1); m[1] = pack16lsb(temp0, temp1); m[2] = pack16msb(temp2, temp3); m[3] = pack16lsb(temp2, temp3); . . . figure 4-3. on the left is a complete list of operations to perform the byte-matrix transposition of figure 4-1 and figure 4-2 . on the left is an equiva lent c-language fragment.
philips semiconductors custom operations for multimedia preliminary specification 4-5 a straightforward coding of the reconstruction algorithm might look as shown in figure 4-4 . this implementation shares many of the undesirabl e properties of the first ex- ample of byte-matrix transpo sition. the code accesses memory a byte at a time instead of a word at a time, which wastes 75% of the available bandwidth. also, in light of the many quad-byte-parallel operations intro- duced in section 4.1.2, ?introduction to custom opera- tions,? it seems inefficient to spend three separate addi- tions and one shift to process a single eight-bit pixel. perhaps even more unfortunate for a vliw processor like pnx1300 is the branch-int ensive code that performs the saturation testing; eliminating these branches could reap a significant performance gain. since mpeg decoding is the kind of task for which pnx1300 was created, there are two custom opera- tions?quadavg and dspuquadad dui?that exactly fit this important mpeg kernel (and other kernels). these cus- tom operations process four pairs of 8-bit pixel values in parallel. in addition, dspuq uadaddui performs saturation tests in hardware, which eliminates any need to execute explicit tests and branches. for readers familiar with the de tails of mpeg algorithms, the use of eight-bit idct values later in this example may be confusing. the standard mpeg implementation calls for nine-bit idct values, but extensive analysis has shown that values outside the range [?128..127] occur so rarely that they can be considered unimportant. pur- suant to this observation, the idct values are clipped into the eight-bit range [?128..127] with saturating arith- metic before the frame reconstruction code runs. the as- sumption that this saturation occurs permits some of pnx1300?s custom operations to have clean, simple def- initions. the first step in seeing how custom operations can be of value in this case, is to unroll the loop by a factor of four. the unrolled code is shown in figure 4-5 . this creates code that is parallel with resp ect to the four pixel compu- tations. as it is easily seen in the code, the four groups of computations (one group per pixel) do not depend on each other. after some experience is gained with custom operations, it is not necessary to unroll loops to discover situations where custom operations are useful. often, a good pro- grammer with knowledge of the function of the custom operations can see by simple inspection opportunities to exploit custom operations. to understand how quadavg and dspuquadaddui can be used in this code, we examin e the function of these cus- tom operations. the quadavg custom operation performs pixel averaging on four pairs of pixels in parallel. formally, the operation of quadavg is as follows: quadavg rscr1 rsrc2 -> rdest takes arguments in registers rsrc1 and rsrc2, and it com- putes a result into register rdest. rsrc1 = [abcd], rsrc2 = [wxyz], and rdest = [pqrs] where a, b, c, d, w, x, y, z, p, q, r, and s are all unsigned eight-bit values. then, quadavg computes the output vector [pqrs] as follows: p = (a + w + 1) >> 1 q = (b + x + 1) >> 1 r = (c + y + 1) >> 1 s = (d + z + 1) >> 1 the pixel averaging in figure 4-5 is evident in the first statement of each of the four groups of statements. the rest of the code?adding idct[i] value and performing the saturation test?can be performed by the dspuquadad- dui operation. formally, its function is as follows: dspuquadaddui rsrc1 rsrc2 -> rdest takes arguments in registers rsrc1 and rsrc2, and it com- putes a result into register rdest. rsrc1 = [efgh], rsrc2 = [stuv], and rdest = [ijkl] where e, f, g, h, i, j, k, and l are unsigned 8-bit values; s, t, u, and v are signed 8-bit val- ues. then, dspuquadaddui computes the output vector [ijkl] as follows: i = uclipi(e + s, 255) j = uclipi(f + t, 255) k = uclipi(g + u, 255) l = uclipi(h + v, 255) the uclipi operation is defined in this case as it is for the separate pnx1300 operation of the same name de- scribed in appendix a, ?pnx1300/01/02/11 dspcpu operations,? . its definition is as follows: void reconstruct (unsigned char *back, unsigned char *forward, char *idct, unsigned char *destination) { int i, temp; for (i = 0; i < 64; i += 1) { temp = ((back[i] + forward[i] + 1) >> 1) + idct[i]; if (temp > 255) temp = 255; else if (temp < 0) temp = 0; destination[i] = temp; } } figure 4-4. straightforward code for mpeg frame reconstruction.
pnx1300/01/02/11 data book philips semiconductors 4-6 preliminary specification uclipi (m, n) { if (m < 0) return 0; else if (m > n) return n; else return m; } to make is easier to see how these operations can sub- sume all the code in figure 4-5 , figure 4-6 shows the same code rearranged to group the related functions. now it should be clear that the quadavg operation can re- place the first four lines of th e loop assuming that we can get the individual 8-bit elements of the back[] and for- ward[] arrays positioned correctly into the bytes of a 32- bit word. that, of course, is easy: simply align the byte ar- rays on word boundaries and access them with word (in- teger) pointers. similarly, it should now be clear that the dspuquadaddui operation can replace the remaining code (except, of course, for storing the result into the destination[] array) assuming, as above, that the 8-bit elements are aligned and packed into 32-bit words. figure 4-7 shows the new code. the arrays are now ac- cessed in 32-bit (int-sized) chunks, the loop iteration con- trol has been modified to reflect the ?four-at-a-time? oper- ations, and the quadavg and dspuquadaddui operations have replaced the bulk of the loop code. finally, figure 4-8 shows a more compact expression of the loop code, eliminating the temporary variable. note that pnx1300 c compiler does the optimization by itself. again, note that the code in figure 4-7 and figure 4-8 assumes that the character arrays are 32-bit word aligned and padded if necessary to fill an integral number of 32-bit words. the original code required three additions, one shift, two tests, three loads, and one store per pixel. the new code using custom operations requires only two custom oper- ations, three loads, and one store for four pixels, which is more than a factor of six im provement. the actual perfor- mance improvement can be even greater depending on how well the compiler is able to deal with the branches in the original version of the code, which depends in part on the surrounding code. reduci ng the number of branches almost always improves the chances of realizing maxi- mum performance on the pnx1300 cpu. the code in figure 4-8 illustrates several aspects of us- ing custom operations in c-language source code. first, the custom operations require no special declarations or syntax; they appear to be simple function calls. second, there is no need to explicitly specify register assignments for sources, destinations, and intermediate results; the compiler and scheduler assign registers for custom oper- ations just as they would for built-in language operations such as integer addition. third, the scheduler packs cus- tom operations into pnx1300 vliw instructions as effec- tively as it packs operation s generated by the compiler for native language constructs. thus, although the burden of making effective use of custom operations falls on the programmer, that burden consists only of discovering the opportunities for exploit- ing the operations and then coding them using standard c-language notation. the compiler and scheduler take care of the rest. void reconstruct (unsigned char *back, unsigned char *forward, char *idct, unsigned char *destination) { int i, temp; for (i = 0; i < 64; i += 4) { temp = ((back[i+0] + forward[i+0] + 1) >> 1) + idct[i+0]; if (temp > 255) temp = 255; else if (temp < 0) temp = 0; destination[i+0] = temp; temp = ((back[i+1] + forward[i+1] + 1) >> 1) + idct[i+1]; if (temp > 255) temp = 255; else if (temp < 0) temp = 0; destination[i+1] = temp; temp = ((back[i+2] + forward[i+2] + 1) >> 1) + idct[i+2]; if (temp > 255) temp = 255; else if (temp < 0) temp = 0; destination[i+2] = temp; temp = ((back[i+3] + forward[i+3] + 1) >> 1) + idct[i+3]; if (temp > 255) temp = 255; else if (temp < 0) temp = 0; destination[i+3] = temp; } } figure 4-5. mpeg frame reconstruction code us ing pnx1300 custom operations; compare with figure 4-4 .
philips semiconductors custom operations for multimedia preliminary specification 4-7 4.4 example 3: motion-estimation kernel another part of the mpeg coding algorithm is motion es- timation. the purpose of motion estimation is to reduce the cost of storing a frame of video by expressing the contents of the frame in terms of adjacent frames. a giv- en frame is reduced to small blocks, and a subsequent frame is represented by sp ecifying how these small blocks change position and appearance; usually, storing the difference information is cheaper than storing a whole block. for example, in a video sequence where the camera pans across a stat ic scene, some frames can be expressed simply as displaced versions of their pre- decessor frames. to create a subsequent frame, most blocks are simply displaced relative to the output screen. the code in this example is for a match-cost calculation, a small kernel of the complete motion-estimation code. as with the previous example, this code provides an ex- cellent example of how to transform source code to make the best use of pnx1300?s custom operations. void reconstruct (unsigned char *back, unsigned char *forward, char *idct, unsigned char *destination) { int i, temp0, temp1, temp2, temp3; for (i = 0; i < 64; i += 4) { temp0 = ((back[i+0] + forward[i+0] + 1) >> 1); temp1 = ((back[i+1] + forward[i+1] + 1) >> 1); temp2 = ((back[i+2] + forward[i+2] + 1) >> 1); temp3 = ((back[i+3] + forward[i+3] + 1) >> 1); temp0 += idct[i+0]; if (temp0 > 255) temp0 = 255; else if (temp0 < 0) temp0 = 0; temp1 += idct[i+1]; if (temp1 > 255) temp1 = 255; else if (temp1 < 0) temp1 = 0; temp2 += idct[i+2]; if (temp2 > 255) temp2 = 255; else if (temp2 < 0) temp2 = 0; temp3 += idct[i+3]; if (temp3 > 255) temp3 = 255; else if (temp3 < 0) temp3 = 0; destination[i+0] = temp0; destination[i+1] = temp1; destination[i+2] = temp2; destination[i+3] = temp3; } } figure 4-6. re-grouped code of figure 4-5 . void reconstruct (unsigned char *back, unsigned char *forward, char *idct, unsigned char *destination) { int i, temp; int *i_back = (int *) back; int *i_forward = (int *) forward; int *i_idct = (int *) idct; int *i_dest = (int *) destination; for (i = 0; i < 16; i += 1) { temp = quadavg(i_back[i], i_forward[i]); temp = dspuquadaddui(temp, i_idct[i]); i_dest[i] = temp; } } figure 4-7. using the custom operation dspquadaddui to speed up the loop of figure 4-6 .
pnx1300/01/02/11 data book philips semiconductors 4-8 preliminary specification figure 4-9 shows the original source code for the match- cost loop. unlike the previous example, the code is not a self-contained function. somewhere early in the code, the arrays a[][] and b[][] are declared; somewhere be- tween those declarations and t he loop of interest, the ar- rays are filled with data. 4.4.1 a simple transformation first, we will look at the simp lest way to use a pnx1300 custom operation. we start by noticing that the computation in the loop of figure 4-9 involves the absolute value of the difference of two unsigned characters (bytes). by now, we are fa- miliar with the fact that pn x1300 includes a number of operations that process all four bytes in a 32-bit word si- multaneously. since the match- cost calculation is funda- mental to the mpeg algorithm, it is not surprising to find a custom operation?ume8uu?that implements this op- eration exactly. to understand how ume8uu can be used in this case, we need to transform the code as in the previous example. though the steps are presented here in detail, a pro- grammer with a even a little experience can often per- form these transformations by visual inspection. to use a custom operation that processes 4 pixel values simultaneously, we first need to create 4 parallel pixel computations. figure 4-10 shows the loop of figure 4-9 unrolled by a factor of 4. un fortunately, the code in the unrolled loop is not parallel because each line depends on the one above it. figure 4-11 shows a more parallel version of the code from figure 4-10 . by simply giving each computation its own cost variable and then sum- ming the costs all at once, each cost computation is com- pletely independent. void reconstruct (unsigned char *back, unsigned char *forward, char *idct, unsigned char *destination) { int i; int *i_back = (int *) back; int *i_forward = (int *) forward; int *i_idct = (int *) idct; int *i_dest = (int *) destination; for (i = 0; i < 16; i += 1) i_dest[i] = dspuquadaddui(quadavg(i_back[i], i_forward[i]), i_idct[i]); } figure 4-8. final version of the frame-reconstruction code. unsigned char a[16][16]; unsigned char b[16][16]; . . . for (row = 0; row < 16; row += 1) { for (col = 0; col < 16; col += 1) cost += abs(a[row][col] ? b[row][col]); } figure 4-9. match-cost loop for mpeg motion estimation. unsigned char a[16][16]; unsigned char b[16][16]; . . . for (row = 0; row < 16; row += 1) { for (col = 0; col < 16; col += 4) { cost += abs(a[row][col+0] ? b[row][col+0]); cost += abs(a[row][col+1] ? b[row][col+1]); cost += abs(a[row][col+2] ? b[row][col+2]); cost += abs(a[row][col+3] ? b[row][col+3]); figure 4-10. unrolled, but not parallel, version of the loop from figure 4-9 .
philips semiconductors custom operations for multimedia preliminary specification 4-9 excluding the array accesses, the loop body in figure 4-11 is now recognizable as the function per- formed by the ume8uu custom operation: the sum of 4 absolute values of 4 differen ces. to use the ume8uu op- eration, however, the code must access the arrays with 32-bit word pointers instead of with 8-bit byte pointers. figure 4-13 shows the loop recoded to access a[][] and b[][] as one-dimensional instead of two-dimensional ar- rays. we take advantage of our knowledge of c-lan- guage array storage conventions to perform this code transformation. recoding to use one-dimensional arrays prepares the code for transformation to 32-bit array ac- cesses. (from here on, until the final code is shown, the declara- tions of the a and b arrays will be omitted from the code fragments for the sake of brevity.) figure 4-14 shows the loop of figure 4-13 recoded to use ume8uu. once again taking advantage of our knowl- edge of the c-language array storage conventions, the one-dimensional byte array is now accessed as a one-di- mensional 32-bit-word array. the declarations of the pointers ia and ib as pointers to integers is the key, but also notice that the multiplier in the expression for row offset has been scaled from 16 to 4 to account for the fact that there are 4 bytes in a 32-bit word. of course, since we are now using one-dimensional ar- rays to access the pixel data, it is natural to use a single for loop instead of two. figure 4-12 shows this stream- lined version of the code wit hout the inner loop. since c- language arrays are stored as a linear vector of values, we can simply increase the number of iterations of the outer loop from 16 to 64 to traverse the entire array. the recoding and use of t he ume8uu operation has re- sulted in a substantial improvement in the performance of the match-cost loop. in t he original version, the code executed 1280 operations (including loads, adds, sub- tracts, and absolute values); in the restructured version, there are only 256 operations?128 loads, 64 ume8uu operations, and 64 additions. th is is a factor of five re- duction in the number of oper ations executed. also, the unsigned char a[16][16]; unsigned char b[16][16]; . . . for (row = 0; row < 16; row += 1) { for (col = 0; col < 16; col += 4) { cost0 = abs(a[row][col+0] ? b[row][col+0]); cost1 = abs(a[row][col+1] ? b[row][col+1]); cost2 = abs(a[row][col+2] ? b[row][col+2]); cost3 = abs(a[row][col+3] ? b[row][col+3]); cost += cost0 + cost1 + cost2 + cost3; figure 4-11. parallel version of figure 4-10 . figure 4-12. the loop of figure 4-14 with the inner loop eliminated. unsigned int *ia = (unsigned int *) a; unsigned int *ib = (unsigned int *) b; for (i = 0; i < 64; i += 1) cost += ume8uu(ia[i], ib[i]); figure 4-13. the loop of figure 4-11 recoded with one-dimensional array accesses. unsigned char a[16][16]; unsigned char b[16][16]; . . . unsigned char *ca = a; unsigned char *cb = b; for (row = 0; row < 16; row += 1) { int rowoffset = row * 16; for (col = 0; col < 16; col += 4) { cost0 = abs(ca[rowoffset + col+0] ? cb[rowoffset + col+0]); cost1 = abs(ca[rowoffset + col+1] ? cb[rowoffset + col+1]); cost2 = abs(ca[rowoffset + col+2] ? cb[rowoffset + col+2]); cost3 = abs(ca[rowoffset + col+3] ? cb[rowoffset + col+3]); cost += cost0 + cost1 + cost2 + cost3;
pnx1300/01/02/11 data book philips semiconductors 4-10 preliminary specification overhead of the inner loop has been eliminated, further increasing the performance advantage. 4.4.2 more unrolling the code transformations of the previous section achieved impressive performance improvements, but given the vliw nature of the pnx1300 cpu, more can be done to exploit pnx1300?s parallelism. the code in figure 4-12 has a loop containing only 4 op- erations (excluding loop overhead). since pnx1300?s branches have a 3-instruction delay and each instruction can contain up to 5 operatio ns, a fully utilized minimum- sized loop can contain 16 operations (20 minus loop overhead). the pnx1300 compilation system performs a wide vari- ety of powerful code transformation and scheduling opti- mizations to ensure that t he vliw capabilities of the cpu are exploited. it is still wise, however, to make pro- gram parallelism explicit in source code when possible. explicit parallelism can only help the compiler produce a fast running program. to this end, we can unroll the loop of figure 4-12 some number of times to create explicit parallelism and help the compiler create a fast running loop. in this case, where the number of iterat ions is a power-of-two, it makes sense to unroll by a fa ctor that is a power-of-two to create clean code. figure 4-15 shows the loop unrolled by a factor of eight. the compiler can apply commo n sub-expression elimi- nation and other optimizations to eliminate extraneous operations in the array indexing, but, again, improve- ments in the source code can only help the compiler pro- duce the best possible code and fastest-running pro- gram. figure 4-16 shows one way to modify the code for sim- pler array indexing. figure 4-14. the loop of figure 4-13 recoded with 32-bit array accesses and the ume8uu custom operation. unsigned int *ia = (unsigned int *) a; unsigned int *ib = (unsigned int *) b; for (row = 0; row < 16; row += 1) { int rowoffset = row * 4; for (col4 = 0; col4 < 4; col4 += 1) cost += ume8uu(ia[rowoffset + col4], ib[rowoffset + col4]); } unsigned int *ia = (unsigned int *) a; unsigned int *ib = (unsigned int *) b; for (i = 0; i < 64; i += 8) { cost0 = ume8uu(ia[i+0], ib[i+0]); cost1 = ume8uu(ia[i+1], ib[i+1]); cost2 = ume8uu(ia[i+2], ib[i+2]); cost3 = ume8uu(ia[i+3], ib[i+3]); cost4 = ume8uu(ia[i+4], ib[i+4]); cost5 = ume8uu(ia[i+5], ib[i+5]); cost6 = ume8uu(ia[i+6], ib[i+6]); cost7 = ume8uu(ia[i+7], ib[i+7]); cost += cost0 + cost1 + cost2 + cost3 + cost4 + cost5 + cost6 + cost7; } figure 4-15. unrolled version of figure 4-12 . this code makes good use of pnx1300?s vliw capabili- ties. unsigned char a[16][16]; unsigned char b[16][16]; . . . unsigned int *ia = (unsigned int *) a; unsigned int *ib = (unsigned int *) b; for (i = 0; i < 64; i += 8, ia += 8, ib += 8) { cost0 = ume8uu(ia[0], ib[0]); cost1 = ume8uu(ia[1], ib[1]); cost2 = ume8uu(ia[2], ib[2]); cost3 = ume8uu(ia[3], ib[3]); cost4 = ume8uu(ia[4], ib[4]); cost5 = ume8uu(ia[5], ib[5]); cost6 = ume8uu(ia[6], ib[6]); cost7 = ume8uu(ia[7], ib[7]); cost += cost0 + cost1 + cost2 + cost3 + cost4 + cost5 + cost6 + cost7; } figure 4-16. code from figure 4-15 with simplified array index calculations.
preliminary specification 5-1 cache architecture chapter 5 by eino jacobs 5.1 memory system overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the high-performance video and audio throughput of pnx1300 is implemented by its dspcpu and autono- mous i/o and co-processing units, but the foundation of this processing is the pn x1300 memory hierarchy. to get the full potential of the ch ip?s processing units, the memory hierarchy must read and write data (and dsp cpu instructions) fast enough to keep the units busy. to meet the requirements of its target applications, pnx1300?s memory hierarchy must satisfy the conflict- ing goals of low cost, simp le system design (e.g., low parts count), and high per formance. since multimedia video streams can require relatively large temporary storage, a significant amount of external dram is re- quired. minimizing the cost of bulk memory is important. pnx1300?s memory system achieves a good compro- mise between cost and performance by coupling sub- stantial on-chip caches with a glueless interface to syn- chronous dram (sdram). sdram provides higher bandwidth than standard dram for only a small cost pre- mium. a block diagram of the memory system is shown in figure 5-1 . sdram permits pnx1300 to use a nar- rower and simpler interface than would be required to achieve similar performance with standard dram. the separate on-chip data a nd instruction caches serve only the dspcpu since the data access patterns of the autonomous i/o and graphics units exhibit little or no lo- cality of reference (they access each piece of the multi- media data stream only once in each operation). without the caches, the cpu would not be able to achieve its performance po tential. sdram has enough bandwidth to handle serial streams of multimedia data, but its bandwidth and latency are insufficient to satisfy the cpu?s high rate of random data accesses and re- peated instruction accesses. table 5-1 shows bandwidth parameters for the pnx1300 dspcpu and the main-memory interface. although 400 mb/s is a lot of bandwidth, it is clear that the sdram alone cannot keep up with the cpu?s maximum require- ments for instructions and da ta. luckily, multimedia algo- rithms resemble other computer programs in terms of lo- cality of reference, so the on -chip caches typically supply vliw cpu three branch units decompressor 32kb, 8-way instruction cache two memory units 16kb, 8-way data cache three sets, each has address, opcode, condition, and guard 224 bits of decompressed instruction two sets, each has a guard, opcode, data, and two address components main memory interface sdram main memory internal data highway: 32-bit address, 32-bit data to on-chip peripherals main-memory bus: glueless, sdram control with 32-bit data figure 5-1. the main components of the pnx1300 memory system. table 5-1. 100-mhz pnx1300 memory bandwidth parameters magnitude use 2800 mb/s instruction bandwidth (224 bits/instruction) 800 mb/s data bandwidth (two 32-bit memory ports) 400 mb/s main-memory bandwidth (one 32-bit port)
pnx1300/01/02/11 data book philips semiconductors 5-2 preliminary specification the majority of instructions and data to the dspcpu. the wide paths to the caches are matched to the bandwidth requirements of the dspcpu. to improve cache behavior and thus program perfor- mance, the caches have a locking mechanism. in addi- tion, the instruction cache is coupled with an instruction decompression unit. the compressed instruction format improves the cache hit rate and reduces the bus band- width required between main memory and cache. in- structions in main memory and cache use the com- pressed format. pnx1300?s processing units access the external sdram through the on-chip central ?data highway? bus. the highway consists of separate 32-bit address and data buses, and use of the bus is mediated by the main- memory interface unit. the main-memory interface con- tains the sdram controller and a central arbiter that de- termines how much of the available sdram memory bandwidth is allocated to each unit. unused bandwidth is always made available to th e vliw cpu for cache refill and memory accesses t hat bypass the caches. table 5-2 gives a summary description of each compo- nent of pnx1300?s memory system. 5.2 dram aperture pnx1300 implements a 32-bit linear address space of bytes. within that address space, pnx1300 supports several different apertures for specific purposes. the dram aperture describes the part of the address space into which the external sdram is mapped. sdram must consist of a single, c ontiguous region of memory, which is the most practica l configuration for pnx1300 systems. the location and size of the dram aperture is defined by two registers, dram_base and dram_limit. these registers are both readable a nd writeable as mmio reg- isters and as pci configurat ion space registers. the view of the registers in mmio space is shown in figure 5-2 . the view of the registers in pci configuration space is described in chapter 11, ?pci interface.? in normal oper- ation, the base address registers are assigned once dur- ing boot and not changed when the dspcpu is running. refer to chapter 11, ?pci interface,? and chapter 13, ?system boot,? for a description of this process. dram_limit must be se t equal to dram_base plus the actual size of sdram present. the amount of the sdram is not required to be a power of 2, but it must be a multiple of 64 kb. note th at the size of the aperture as set in the pci configuration space can be larger, be- cause it must be a power of 2. a memory operation will access sdram if its address satisfies: [dram_base] address < [dram_limit] any address outside this range cannot access sdram. when pnx1300 is reset, dram_base_field is set to 0x0 and dram_limit is set to 0x0010 0000 (1-mb dram aperture starting at address 0x0). the boot pro- cess described in chapter 13, ?system boot,? overrides these initial settings. table 5-2. summary of memory system characteristics unit description branch units branch units execute branch operations. up to three branch operations can be executed in parallel, but the progr am must guarantee that only one branch is taken. decompres- sion unit instructions are stored in memory and in the instruction cache in a space-saving, com- pressed format. the decompression unit expands instructions to their full, 28-byte size before they are issued to the cpu. instruction cache the instruction cache holds 32 kb, is 8-way set-associative, and has a 64-byte block size. a miss in a block causes the entire block to be read from sdram. the cache can sustain an issue rate of one instruction per cycle on cache hits. memory units memory units execute load and store opera- tions. the data cache is dual ported to allow the memory units to operate concurrently. data cache the data cache holds 16 kb, is 8-way set- associative, has a 64-byte block size, and implements a copyback, allocate-on-write pol- icy. a miss in a block causes the entire block to be read from sdram. the cache supports memory-mapped i/o through non-cacheable address regions. data highway the on-chip data highway bus serves all on- chip units. the highway has separate 32-bit data and address buses. bus bandwidth is allocated by the highway arbiter according to one of several modes. main-memory interface the main-memory interface contains the data- highway access arbiter, the sdram control- ler, and mmio logic. sdram main memory external sdram connects gluelessly to pnx1300 over the 32-bit main-memory bus. 31 0 3 7 11 15 19 23 27 dram_base (r/w) 0x10 0000 dram_base_field dram_limit (r/w) 0x10 0004 dram_limit_field 0 00 00 00 00 00 00 00 0 0 00 00 00 00 00 00 00 0 mmio_base offset: 0 0 0 0 figure 5-2. formats of the dram_base and dram_limit registers.
philips semiconductors cache architecture preliminary specification 5-3 5.3 data cache the data cache serves only the dspcpu and is con- trolled by two memory units that execute the load and store operations issued by the dspcpu. the following sections describe the data cache and its operation; table 5-3 summarizes the important characteristics for easy reference. 5.3.1 general cache parameters the pnx1300 data cache is 16 kb in size with a 64-byte block size. thus, it contains 256 blocks each with its own address tag. the cache is 8-way set-associative, so there are 32 sets, each containing 8 tags. a single valid bit is associated with a block, so each block and associ- ated address tag is either entirely valid in the cache or in- valid. on a cache miss, 64 bytes are read from sdram to make the entire block valid. each block also contains a dirty bit, which is set whenev- er a write to the block occurs. each set contains 10 bits to support the hierarchical lru replacement policy. the geometry of the data cache is available to software by reading the mmio register dc_params. figure 5-3 shows the format of the dc_params register; table 5-4 lists its field values. th e product of block size, associativity, and number of sets gives the total cache size (16 kb in this case). 5.3.2 address mapping pnx1300 data addresses are mapped onto the data cache storage structure as shown in figure 5-4 . a data address is partitioned into four fields as described in table 5-5 . table 5-3. summary of data cache characteristics characteristic pnx1300 implementation cache size 16 kb cache associativity 8-way set-associative block size 64 bytes valid bits one valid bit per 64-byte block dirty bits one dirty bi t per 64-byte block miss transfer order miss transfe rs begin with the critical word first replacement poli- cies copyback, allocate on write, hierarchical lru endianness either little- or big-endian, determined by pcsw bit ports the cache is quasi dual ported; two accesses can proceed concurrently if they reference different banks (deter- mined by bits [4:2] of the computed addresses) alignment access must be naturally aligned (32-bit words on 32-bit boundar ies, 16-bit half- words on 16-bit boundaries); the appro- priate number of lsbs of un-naturally aligned addresses are set to zero. for misaligned stores, pcsw.mse is asserted to generate an exception partial word opera- tions the cache implements 8-bit and 16-bit accesses with the same performance as 32-bit accesses operation latency three cyc les for both load and store operations coherency enforce- ment software uses special operations to enforce cache coherency cache locking up to 1/2 (four out of 8 blocks of each set) of the cache contents can be locked; granularity is 64-byte non-cacheable region one non-cacheable aperture in the dram address space is supported. table 5-4. dc_para ms field values field name value block size 64 associativity 8 number_of_sets 32 table 5-5. data address field partitioning field address bits purpose byte 1..0 byte offset within a word for byte or half- word accesses word 5..2 selects one of the words in a set (one of 16 words in the case of pnx1300) set 10..6 selects one of the sets in the cache (one of 32 in the case of pnx1300) tag 31..11 compared against address tags of set members 31 0 3 7 11 15 19 23 27 dc_params (r/o) 0x10 001c associativity number_of_sets mmio_base offset: blocksize figure 5-3. format of the dc_params register. 0 word byte set tag 31 1 2 5 6 10 11 data cache address figure 5-4. data cache address partitioning.
pnx1300/01/02/11 data book philips semiconductors 5-4 preliminary specification 5.3.3 miss processing order when a miss occurs, the dat a cache fills the block con- taining the requested word fr om the critical word first. the cpu is stalled until the firs t word is transferred. the block is then filled up wh ile the cpu keeps running. 5.3.4 replacement policies, coherency the cache implements a copyback replacement policy with one dirty bit per 64-byte block. thus, when a miss occurs and the block selected for replacement has its dirty bit set, the dirty block must be written to main mem- ory to preserve its modified contents. on pnx1300, the dirty block is written to memory before the needed block is fetched. coherency is not maintained in any way by hardware be- tween the data cache, the instruction cache, and main memory. special operations are available to implement cache coherency in software. see section 5.6, ?cache coherency,? for a discussion of coherency issues. write misses are handled with an allocate-on-write poli- cy?the write that caused the mi ss stores its data in the cache after the missing block is fetched into the cache. the cache implements a hierarchical lru replacement algorithm to determine which of the eight elements (blocks) in a set is replaced. the algorithm partitions the eight set elements into four groups, each group with two elements. the hierarchical lru replacement victim is determined by selecting the least-recently used group of two elements and then selecting the least-recently used element in that group. this hierarchical algorithm yields performance close to full lru but is simpler to imple- ment. see section 5.5, ?lru algorithm,? for a full discussion of the lru algorithm. 5.3.5 alignment, partial-word transfers, endian-ness the cache implements 32-bit word, 16-bit half-word, and 8-bit byte transfers. all transfers, however, must be to addresses that are naturally a ligned; that is , 32-bit words must be aligned on 32-bit boundaries, and 16-bit half- words must be aligned on 16-bit boundaries. like other pnx1300 processing units, the cpu has the capability to use either big- or little-endian byte order. it is recommended that all units and the cpu run with the same endian-ness. detailed endian-ness description can be found in appendix c, ?endian-ness.? 5.3.6 dual ports to allow two accesses to pr oceed in parallel, the data cache is quasi-dual ported. the cache is implemented as eight banks of single-ported memory, but the hardware allows each bank to operate independently. thus, when the addresses of two simult aneous accesses select two different banks, both acce sses can complete simulta- neously. bank selection is determined by the three low- order address bits [4..2] of each address. thus, the words in a 64-byte cache block are distributed among the eight blocks, which prevents conflicts between two simul- taneously issued accesses to adjacent words in a cache block. the pnx1300 compiling system attempts to avoid bank conflicts as much as possible. the dual-ported cache can execute the load and store opcodes (ild8d, uld8d, ild16d, uld16d, ld32d, h_st8d, h_st16d, h_st32d, ild8r, ul d8r, ild16r, uld16r, ld32r, ild16x, uld16x, ld32x) in either or both of the two ports. the special opcodes alloc, dcb, dinvalid, pref, rdtag and rdstatus can only be executed in the second port, not in the first port. whenever any of these special opcodes is issued in the second port, there should not be a concur- rent load or store operation in the first. this is a special scheduling constraint. 5.3.7 cache locking the data cache allows the contents of up to one-half of its blocks to be locked. thus, on pnx1300, up to 8 kb of the cache can be used as a high-speed local data mem- ory. only four out of eight blocks in any set can be locked. a locked block is never chos en as a victim by the re- placement algorithm; its cont ents remain undisturbed un- til either (1) the block?s locked status is changed explicitly by software, or (2) a dinvalid operation is executed that targets the locked block. cache locking occurs only for the data in the address range described by the mmio registers dc_lock_addr and dc_lock_size. the granulari- ty of the address range is one 64-byte cache block. the mmio register dc_lock_ctl contains the cache-lock- ing enable bit dc_lock_enable. figure 5-5 shows the layout of the data-cache lock registers. locking will occur for an address if locking is enabled and both of the following are true: 1. the address is greater than or equal to the value in dc_lock_addr. 2. the address is less than the sum of the values in dc_lock_addr and dc_lock_size. programmers (or compilers) must combine all data that needs to be locked into this single linear address range. setting dc_lock_enable to ?1? causes the following sequence of events: 1. all blocks that are in cach e locations that will be used for locking are copied back to main memory (if they are dirty) and removed from the cache. 2. all blocks in the lock range are fetched from main memory into the cache. if any block in the lock range was already in the cache, it?s first copied back into main memory (if it?s dirty) and invalidated. 3. the lru status of any set that contains locked blocks is set to the initialization value. 4. cache locking is activated so that the locked blocks cannot be victims of the replacement algorithm. this sequence of events is triggered by writing ?1? to dc_lock_enable even if the enable is already set to
philips semiconductors cache architecture preliminary specification 5-5 ?1?. setting dc_lock_enable to ?0? causes no action except to allow the previously locked blocks to be re- placement victims. to program a new lock range, the following sequence of operations is used: 1. disable cache lockin g by writing ?0? to dc_lock_enable. 2. define a new lock range by writing to dc_lock_addr and dc_lock_size. 3. enable cache locking by writing ?1? to dc_lock_enable. dirty locked blocks can be written back to main memory while locking is enabled by executing copyback opera- tions in software. programmer?s note: software should not execute din- valid operations on a locked block. if it does, the block will be removed from the cache, creating a ?hole? in the lock range (and the data cache) that cannot be reused until locking is deactivated. cache locking is disabled by default when pnx1300 is reset. the reserved field in dc _lock_ctl should be ig- nored on reads and written as all zeroes. locking should not be enabled by pci accesses to the mmio registers. 5.3.8 memory hole and pci aperture disable bits 6 and 5 in dc_lock_ctl comprise the aperture_control field. this field can be used to change the memory map as seen by the dspcpu. the hardware reset val ue of the field co rresponds to the memory map as described in section 3.4.1, ?memory map.? 5.3.9 non-cacheable region the data cache supports one non-cacheable address re- gion within the dram address space aperture. the base address of this region is determined by the value in the dram_cacheable_limit mmio register, which is shown in figure 5-6 . since uncached memory opera- tions always incur many stall cycles, the non-cacheable region should be used sparingly. a memory operation is non-cacheable if its target ad- dress satisfies: [dram_cacheable_limit] <= address < [dram_limit] thus, the non-cacheable region is at the high end of the dram aperture. the format of the dram_cacheable_limit register forces the size of the non-cacheable region to be a multiple of 64 kb. when pnx1300 is reset, dram_cacheable_limit is set equal to dram_limit, whic h results in a zero-length non-cacheable region. programmer?s note: when dram_cacheable_limit is changed to enlarge the region that is non-cacheable, software must ensure cohere ncy. this is accomplished by explicitly copying back di rty data (using dcb opera- tions) and invalidating (using dinvalid operations) the cache blocks in the prev iously unlocked region. dc_lock_addr (r/w) 0x10 0014 dc_lock_address dc_lock_size (r/w) 0x10 0018 dc_lock_size 000000 0 00000 31 0 3 7 11 15 19 23 27 dc_lock_ctl (r/w) 0x10 0010 00000 000000 000000 00 000000 dc_lock_enable mmio_base offset: 00 00 00 00 0 00 00 00 00 0 00 00 00 00 0 aperture_control reserved 65 figure 5-5. formats of the registers in charge of data-cache locking. table 5-6. aperture control field value memory map properties 00 (reset) normal operation memory map ( section 3.4.1 ): ? loads to 0..0xff always return 0 and cause no pci read (memory hole is enabled) ? pci aperture(s) are enabled 01 ? loads to address 0..0xff cause a pci read, i.e. the memory hole is disabled ? pci aperture(s) are enabled 10 pci apertures are disabled for loads ? loads return a 0 and cause no pci read 11 reserved for future extensions 31 0 3 7 11 15 19 23 27 dram_cacheable_limit (r/w) 0x10 0008 dram_cacheable_limit_field 0000000000000000 mmio_base offset: figure 5-6 formats of the dram_cacheable_limit register.
pnx1300/01/02/11 data book philips semiconductors 5-6 preliminary specification 5.3.10 special data cache operations a program can exercise some control over the operation of the data cache by execut ing special operations. the special operations can cause the data cache to initiate the copyback or invalidation of a block in the cache. these operations are typically used by software to keep the cache coherent with main memory. in addition, there are special operations that allow a pro- gram to read tag and status information from the data cache. special data cache operations are always executed on the memory port associated with issue slot 5. 5.3.10.1 copyback and invalidate operations the data cache controller recognizes a copyback and an invalidate operation as shown in table 5-7 . the dcb and dinvalid operations both compute a target word address that is the sum of a register and seven-bit offset. the offset can be in the range [?256..252] and must be divisible by four. dcb operation. the dcb operation computes the target address, and if the block c ontaining the address is found in the data cache, its contents are written back to main memory if the block is both valid and dirty. if the block is not present, not valid, or not dirty, no action results from the dcb operation. if the dcb causes a copyback to occur, the cpu is stalled until the co pyback completes. if the block is not in cache, the operation causes no stall cy- cles. if the block is in cache but not dirty, the operation causes 4 stall cycles. if the block is dirty, the dcb opera- tion causes a writeback and takes at least 19 stall cycles. the dcb operation clears the dirty bit but leaves a valid copy of the written-back block in the cache. dinvalid operation. the dinvalid operation computes the target address, and if the block containing the ad- dress is found in the data cache, its valid and dirty bits are cleared. no copyback oper ation will occur even if the block is valid and dirty prior to executing the dinvalid op- eration. the cpu is stalled for 2 cycles, if the target block is in the cache; otherwis e, no stall cycles occur. a dinvalid or dcb operation updates the lru information to least recently used in its set. programmer?s note: software should not execute din- valid operations on locked blocks; otherwise, a ?hole? is created that cannot be reused until locking is deactivated. 5.3.10.2 data cache tag and status operations the data cache controller recognizes two dspcpu op- erations for reading cach e status as shown in table 5-8 . the rdtag and rdstatus operations both compute a target word address that is the sum of a register and scaled seven-bit offset. the offset mu st be divisible by four and in the range [?256..252]. rdtag operation. the target address computed by rdtag selects the data cache block by specifying the cache set and set element directly. address bits [10..6] specify the cache set (one of 32), and bits [13..11] specify the set el- ement (one of eight). all other target address bits are ig- nored. this operation causes no cpu stall cycles. the result of the rdtag operat ion is a full 32-bit word with the format shown in figure 5-7 . rdstatus operation. the target address computed by rd- status selects the data cac he set by specifying the set number directly. address bits [10..6] specify the cache set (one of 32); all other ta rget address bits are ignored. this operation causes 1 cpu stall cycle. the result of the rdstatus operation is a full 32-bit word with the format shown in figure 5-7 . see section 5.6.7, ?lru bit definitions,? for a description of the lru bits. table 5-7. copyback and invalidate operations mnemonic description dcb(offset) r src1 data-cache copyback block. causes the block that contains the target address to be copied back to main memory if the block is valid and dirty. dinvalid(offset) r src1 data-cache invalidate block. causes the block that contains the target address to be invalidated. no copy- back occurs even if the block is dirty. table 5-8. cache read-status operations mnemonic description rdtag(offset) r src1 read data-cache tag. the target address selects a data-cache block directly; the operati on returns a 32-bit result containing the 21-bit cache tag and the valid bit. rdstatus(offset) r src1 read data-cache status. the target address selects a data-cache set directly; the operati on returns a 32-bit result containing t he set?s eight dirty bits and ten lru bits. 31 0 3 7 11 15 19 23 27 valid rdtag result format tag rdstatus result format lru dirty 00000000000 0000000000 000 figure 5-7. result formats for rdtag and rdstatus operations.
philips semiconductors cache architecture preliminary specification 5-7 5.3.10.3 data cache allocation operation the data cache controller recognizes allocation opera- tions as shown in table 5-9 . the allocation operations al- locate a block and set the status of this block to valid. no data is fetched from main memory. the allocated block is undefined after this operation. the programmer has to fill it with valid data by store operations. allocation oper- ations to apertures othe r than cacheable dram will be discarded. allocation of a non-dirty block causes 3 stall cycles. allocation of a dirty block will cause writeback of this block to the sdram and ta ke at least 11 stall cycles. 5.3.10.4 data cache prefetch operation the data cache controller recognizes prefetch opera- tions as shown in table 5-10 . the prefetch operations load a full cache block from memory concurrently with other computation. if the prefetched block is already in cache, no data is fetched from main memory. prefetch operations to other apertures than cacheable dram are discarded. this operation is not guaranteed to execute, it will not execute if the cach e is already occupied with two cache misses when the operation is issued. the prefetch operations cause 3 stall cycles if there is no copyback of a dirty block. if a dirty block is the target of the prefetch, the dirty block will be written back to sdram, and at least 11 stall cycles are taken. 5.3.11 memory operation ordering the pnx1300 memory system implements traditional or- dering for memory operations that are issued in different clock cycles. that is, the ef fects of a memory operation issued in cycle j occur before the effects of a memory op- eration issued in cycle j+1. for memory operations issued in the same cycle, howev- er, it is not possible to ex ecute memory operations in a traditional order. so long as the simultaneous memory operations access different addresses (aliasing is not possible in pnx1300), no problems can occur. if two si- multaneous operations do access the same address, however, pnx1300 behavior is undefined. specifically, two cases are possible: 1. when multiple values are written to the same address in the same cycle, the resulting value in memory is un- defined. 2. when a read and a write occur to the same address in the same clock cycle, the value returned by the read is undefined. the behavior of simultaneous accesses to the same ad- dress is undefined regardless of whether one or both memory operations hit in the cache. hidden memory system concurrency . some cache operations may be overla pped with cpu execution. in general, a program cannot determine in what order cache misses will complete nor can a program determine when and in what order co pyback operations will com- plete. a program can, however, enforce the completion of copyback transactions to main memory because copy- back and invalidate operatio ns can complete only if pending copyback transactions for the same block have completed. thus, a program can synchronize to the com- pletion of a copyback operation by dirtying a block, issu- ing a copyback operation for the block, and then issuing an invalidate operation for the block. ordering of special memory operations. the follow- ing are special memory operations: 1. loads or stores to mmio addresses. 2. non-cached loads or stores. 3. any copyback or invalidate operation. 4. loads or stores that cause a pci-bus access. the cpu is stalled until th ese special memory opera- tions are completed; there is no overlap of cpu execu- tion with these special memory operations. thus, a pro- grammer can assume that traditional memory operation ordering applies to special memory operations. note, however, that ordering is undefined for two special mem- ory operations issued in the same cycle. table 5-9. data cache allocation operations mnemonic description allocd(offset) r src1 data-cache allocate block with dis- placement. causes the block with address (rsrc1+offset) & (~(cache_block_size - 1)) to be allo- cated and set valid. allocr r src1 rsrc2 data-cache allocate block with index. causes the block with address (rsrc1+rsrc2) & (~(cache_block_size - 1)) to be allocated and set valid. allocx r src1 rsrc2 data-cache allocate block with scaled index. causes the block with address (rsrc1 + 4 * rsrc2) & (~(cache_block_size - 1)) to be allo- cated and set valid. table 5-10. data cache prefetch operations mnemonic description prefd(offset) r src1 data-cache prefetch block with dis- placement. causes the block with address (rsrc1+offset) & (~(cache_block_size - 1)) to be prefetched prefr r src1 rsrc2 data-cache prefetch block with index. causes the block with address (rsrc1+rsrc2) & (~(cache_block_size - 1)) to be prefetched. pref16x r src1 rsrc2 data-cache prefetch block with scaled 16-bit index. causes the block with address (rsrc1 + 2 * rsrc2) & (~(cache_block_size - 1)) to be prefetched. pref32x r src1 rsrc2 data-cache prefetch block with scaled 32-bit index. causes the block with address (rsrc1 + 4 * rsrc2) & (~(cache_block_size - 1)) to be prefetched.
pnx1300/01/02/11 data book philips semiconductors 5-8 preliminary specification 5.3.12 operation latency load and store operations have an operation latency of three cycles, regardless of th e size of the data transfer. 5.3.13 mmio register references memory operations that reference mmio registers are not cached, and the cpu is stalled until the mmio refer- ence completes. a mmio register reference occurs when an address is in the range: [mmio_base] address < ([mmio_base] + 0x200000) the size of the mmio apertu re is hardwired at 2 mb. 5.3.14 pci bus references any cpu memory operation that references an address outside the sdram and mmio address apertures is as- sumed to reference a device or memory on the pci bus. pci-bus data transfers are not cached, and the cpu is stalled until the pc i transfer completes. 5.3.15 cpu stall conditions the data cache causes the cpu to stall when: 1. any cache miss occurs. 2. two simultaneously issued, cacheable memory oper- ations need to access the same cache bank (bank conflict). 3. an access that references an address in the mmio aperture is issued. 4. an access to the pci bus is issued. 5. a non-trivial copyback or invalidate operation is is- sued. 6. an access to the non-cacheable region in the dram aperture is issued. 5.3.16 data cache initialization when pnx1300 is reset, the data cache executes an ini- tialization sequence. the ca che asserts the cpu stall signal while it sequentially re sets all valid and dirty bits. the cache de-asserts the stall signal after completing the initialization sequence. 5.4 instruction cache the instruction cache stor es compressed cpu instruc- tions; instructions are decompressed before being deliv- ered to the cpu. the following sections describe the in- struction cache and its operation; table 5-11 summarizes instruction- cache characteristics. 5.4.1 general cache parameters the pnx1300 instruction cache is 32 kb in size with a 64-byte block size. thus, th e cache contains 512 blocks each with its own address tag. the cache is 8-way set- associative, so there are 64 sets, each containing 8 tags. a single valid bit is associated with a block, so each block and associated address tag is either entirely valid or in- valid; on a cache miss, 64 bytes are read from sdram to make the entire block valid. the geometry of the instruction cache is available to soft- ware by reading the mmio register ic_params. figure 5-8 shows the format of the ic_params register; table 5-12 lists its field values. the product of the block size , associativity, and number of sets gives the total cache size (32 kb in this case). 5.4.2 address mapping pnx1300 instruction addresses are mapped onto the data cache storage structure as shown in figure 5-9 . an instruction address is partitio ned into three fields as de- scribed in table 5-13 table 5-11. instruction cache characteristics characteristic pnx1300 implementation cache size 32 kb cache associativity 8-way set-associative block size 64 bytes valid bits one valid bit per 64-byte block replacement policy hierarchica l lru (least-recently used) among the eight blocks in a set operation latency branch delay is three cycles coherency enforce- ment software uses a special operation to enforce cache coherency cache locking up to 1/2 (four out of eight blocks of each set) of the cache contents can be locked; granularity is 64 bytes table 5-12. ic_params field values field name value blocksize 64 associativity 8 number_of_sets 64 31 0 3 7 11 15 19 23 27 ic_params (r/o) 0x10 0020 associativity number_of_sets mmio_base offset: blocksize figure 5-8. format of the instruction-cache parameters register.
philips semiconductors cache architecture preliminary specification 5-9 5.4.3 miss processing order when a miss occurs, the inst ruction cache starts filling the requested block from the beginning of the block. the dspcpu is stalled until the entire block is fetched and stored in the cache. 5.4.4 replacement policy the hierarchical lru replacement policy implemented by the instruction cache is identical to that implemented by the data cache. see section 5.3.4, ?replacement pol- icies, coherency,? for a description of the hierarchical lru algorithm. 5.4.5 location of program code all program code must first be loaded into sdram. the instruction cache cannot fetc h instructions from other memories or devices. in particular, the cache cannot fetch code from on-chip devices or over the pci bus. 5.4.6 branch units the instruction cache is closely coupled to three branch units. each unit can accept a branch independently, so three branches can be processed simultaneously in the same cycle. branches in pnx1300 are called ?delayed branches? be- cause the effect of a successful (taken) branch is not seen in the flow of control until some number of cycles af- ter the successful branch is executed. the number of cy- cles of latency is called the branch delay. on pnx1300, the branch delay is three cycles. although three branches can be executed simultaneous- ly, correct operation of t he dspcpu requires that only one branch be successful (taken) in any one cycle. dspcpu operation is undefined if more than one con- current branch operation is successful. each branch unit takes four inputs from the dspcpu: the branch opcode, a guard bit, a branch condition, and a branch target address. a branch is deemed successful if and only if the opcode is a branch opcode, the guard bit is true (i.e., = 1), and the condition (determined by the opcode) is satisfied. 5.4.7 coherency: special iclr operation a program can exercise some control over the operation of the instruction cache by executing the special iclr op- eration. this operation causes the instruction cache to clear the valid bits for all blocks in the cache, including locked blocks. the lru replacement status of all blocks is reset to its initia l value. the cpu is stalled while iclr is executing. see section 5.6, ?cache coherency,? for further discus- sion of coherency issues. 5.4.8 reading tags and cache status the instruction cache supports read access to its tag and status bits, but not through special operations as with the data cache. since the instru ction cache and branch units can execute only resultless ope rations, access to the in- struction-cache tags and stat us bits is implemented us- ing normal load operations executed by the dspcpu that reference a special region in the mmio address ap- erture. the region is 64 kb long and starts at mmio_base. instruction cache tags and status bits are read-only; store operations to this region have no effect. mmio operations to this special region are only allowed by the dspcpu, not by any other masters of the on-chip data highway, such as external pci initiators. programmer?s note: tag and status information cannot be read by pci access, but only by dspcpu access. tag and status read cannot be scheduled in the same cy- cle with or one cycle af ter an iclr operation. reading a tag and valid bit. to read the tag and valid bit for a block in the instruction cache, a program can ex- ecute a ld32 operation directed at the instruction-cache region in the mmio aperture. the top of figure 5-10 shows the required format for the target address. the most-significant 16 bits must be equal to mmio_base, the least-significant 15 bits select the block (by naming the set and set member), and bit 15 must be set to zero to perform a tag read. note that in pnx1300, valid set numbers range from 0 to 63. space to encode set num- bers 64 to 511 is provided for future extensions. a ld32 with an address as specified above returns a 32- bit result with the format shown at the top of figure 5-11 . bit 20 contains the state of the valid bit, and the least-sig- nificant 20 bits contain the tag for the block addressed by the ld32. reading the lru bits. to read the lru bits for a set in the instruction cache, a program can execute a ld32 op- eration as above but using the address format shown at the bottom of figure 5-10 . in this format, bit 15 is set to one to perform the read of the lru bits, and the tag_i_mux field is set to ze ros because it is not needed. table 5-13. instruction address field partitioning field address bits purpose offset 5..0 byte offset into a set set 11..6 selects one of the sets in the cache (one of 64 in the case of pnx1300) tag 31..12 compared against address tags of set members 0 offset set tag 31 5 6 11 12 instruction cache address figure 5-9. instruction-cache address partitioning.
pnx1300/01/02/11 data book philips semiconductors 5-10 preliminary specification reading the lru bits produces a 32-bit result with the format shown at the bottom of figure 5-11 . the least-sig- nificant ten bits contain the st ate of the lru bits when the ld32 was executed. see section 5.6.7, ?lru bit defini- tions,? for a description of the lru bits. note that the tag_i_mux and se t fields in the address for- mats of figure 5-10 are larger than necessary for the in- struction cache in pnx1300. these fields will allow fu- ture implementations with larger instruction caches to use a compatible mechanism for reading instruction cache information. the tag_i_mux field can accommo- date a cache of up to 16-way set-associativity, and the set field can accommodate a cache with up to 512 sets. for pnx1300, the following c onstraints of the values of these fields must be observed: 1. 0 tag_i_mux 7 2. 0 set 63 5.4.9 cache locking like the data cache, the instruction cache allows up to one-half of its blocks to be locked. a locked block is nev- er chosen as a victim by the replacement algorithm; its contents remain undisturbed until the locked status is changed explicitly by software. thus, on pnx1300, up to 16 kb of the cache can be used as a high-speed instruc- tion ?rom.? only four out of ei ght blocks in any set can be locked. the mmio registers ic_loc k_addr, ic_lock_size, and ic_lock_ctl?shown in figure 5-12 ?are used to define and enable instructio n locking in the same way that the similarly named data-c ache locking registers are used. section 5.3.7, ?cache locking,? describes the de- tails of cache locking; they are not repeated here. setting the ic_lock_enable bit (in ic_lock_ctl) to ?1? causes the following sequence of events: 1. the instruction cache invalidates all blocks in the cache. 2. the instruction cache fetches all blocks in the lock range (defined by ic_lock_addr and ic_lock_size) from main memory into the cache. 3. cache locking is activated so that the locked blocks cannot be victims of the replacement algorithm. the only difference between this sequence and the ini- tialization sequence for data-c ache locking is that dirty blocks (which cannot exist in the instruction cache) are not written back first. programmer?s note: programmers (or compilers) must combine all instructions that need to be locked into the single linear instruction-locking address range. the special iclr operation also removes locked blocks from the cache. if blocks are locked in the instruction cache, then instruction cache locking should be disabled in software (by writing ?0 ? to ic_lock_ctl) before an iclr operation is issued. locking should not be enabled by pci accesses to the mmio register. 5.4.10 instruction cache initialization and boot sequence when pnx1300 is reset, the instruction cache executes an initialization and processo r boot sequence. while re- set is asserted, the instruct ion cache forces nop opera- tion to the dspcpu, and the program counter is set to the default value reset_vector. when reset is deassert- ed, the initialization and boot sequence is as follows. 31 0 3 7 11 15 19 23 27 to read tag & valid bit to read lru bits set mmio_base 10000 0 mmio_base tag_i_mux set 00 00 figure 5-10. required address format for reading instruction-cache tags and status. 31 0 3 7 11 15 19 23 27 valid i-cache tag-read result format i-cache status-read result format lru 00000000000 0000000000 000 0 00000000 tag figure 5-11. result formats for reads from the instruction-cache region of the mmio aperture. ic_lock_addr (r/w) 0x10 0214 ic_lock_address ic_lock_size (r/w) 0x10 0218 ic_lock_size 000000 000000 31 0 3 7 11 15 19 23 27 ic_lock_ctl (r/w) 0x10 0210 0 000000 000000 000000 00 000000 ic_lock_enable mmio_base offset: 000000 00 0 00 000000 00 000000 00 reserved figure 5-12. formats of the registers that control instruction-cache locking.
philips semiconductors cache architecture preliminary specification 5-11 1. the stall signal is asserted to prevent activity in the dspcpu and data cache. 2. the valid bits for all blocks in the instruction cache are reset. 3. at the completion of the block invalidation scan, the stall signal to the dspcpu and data cache are deas- serted. 4. the dspcpu begins normal operation with an in- struction fetch from the address reset_vector. the initialization process takes 512 clock cycles. reset sets reset_vector equal to dram_base so that program execution starts at the init ial value of dram_base. the initial value of dram_base is determined as described in section 5.2, ?dram aperture.? 5.5 lru algorithm when a cache miss occurs, the block containing the re- quested data must be brought into the cache to replace an existing cache block. the lru algorithm is responsi- ble for selecting the replacem ent victim by selecting the least-recently-used block. the 8-way set-associative caches implement a hierarchi- cal lru replacement algorithm as follows. eight sets are partitioned into four groups of two elements each. to se- lect the lru element: ? first, the lru pair is selected out of the four pairs using a four-way lru algorithm. ? second, the lru element of the pair is selected using a two-way lru algorithm. 5.5.1 two-way algorithm the two-way lru requires an administration of one bit per pair of elements. on every cache hit to one of the two blocks, the cache writes once to this bit (just a write, not a read-modify-write). if the even-numbered block is ac- cessed, the lru bit is set to ?1?; if the odd-numbered block is accessed, the lru bit is set to ?0?. on a miss, the cache replaces the lru element, i.e. if the lru bit is ?0?, the even numbered element will be replaced; if the lru bit is ?1?, the odd number ed element will be replaced. 5.6 cache coherency the pnx1300 hardware does not implement coherency between the caches and main memory. generalized co- herency is the responsibility of software, which can use the special operations dcb, dinvalid, and iclr to enforce cache/memory synchronization. 5.6.1 example 1: data-cache/input-unit coherency before the cpu commands the video-in unit to capture a video frame, the cpu must be sure that the data cache contains no blocks that are in the address region that the video-in unit will use to store th e input frame. if the video- in unit performs its input function to an address region and the data cache does hold one or more blocks from that region, any of the following may happen: ? a miss in the data cache may cause a dirty block to be copied back to the address region being used by the video-in unit. if the video-in unit already stored data in the block, the writ e-back will corrupt the frame data. ? the cpu will read stale dat a from the cache instead of from the block in main memory. even though the video-in unit stored new video data in the block in main memory, the cache contents will be used instead because it is still valid in the cache. to prevent erroneous copybacks or the use of stale data, the cpu must use dinvalid operations to invalidate all blocks in the address region that will be used by the vi unit. 5.6.2 example 2: data-cache/output-unit coherency before the cpu commands the video-out unit to send a frame of video, the cpu must be sure that all the data for the frame has been written from the data cache to the re- gion of main memory that the video-out unit will output. explicit action is necessary because the data cache? with its copyback write policy?will hold an exclusive copy of the data until it is either replaced by the lru al- gorithm or the cpu explicitly forces it to be copied back to main memory. before an output command is issued to the video-out unit, the cpu must execute dcb operations to force co- herency between cache contents and main memory. 5.6.3 example 3: instruction-cache/data- cache coherency if code prepared by a program running on the cpu must be subsequently executed, coherency between the in- struction and data caches must be enforced. this is ac- complished by a two-step process: 1. coherency between the data cache and main memo- ry must be enforced sinc e the instruction cache can fetch instructions only from main memory. 2. coherency between the instruction cache and main memory is enforced by executing an iclr operation. the cpu will now be able to fetch and execute the new instructions. 5.6.4 example 4: instruction-cache/input- unit coherency when an input unit is used to load program code into main memory, the iclr operation must be issued before attempting to execute the new code. 5.6.5 four-way algorithm for administration of the four-way algorithm, the cache maintains an upper- left triangular matr ix ?r? of 1-bit ele- ments without the diagonal. r contains six bits (in gener-
pnx1300/01/02/11 data book philips semiconductors 5-12 preliminary specification al, n (n?1)/2 bits for n-way lru). if set element k is ref- erenced, the cache sets row k to ?1? and column k to ?0?: r[k, 0..n?1] 1, r[0..n?1, k] 0 the lru element is the one for which the entire row is ?0? (or empty) and the entire co lumn is ?1? (or empty): r[k, 0..n?1] = 0 and r[0..n?1, k] = 1 for a 4-way set-associative cache, this algorithm re- quires six bits per set of four cache blocks. on every cache hit, the lru info is updated by setting three of the six bits to ?0? or ?1?, depending on the set element that was accessed. the bits need only be written, no read- modify-write is necessary. on a miss, the cache reads the six lru bits to determine the replacement block. pnx1300 combines the two-way and four-way algo- rithms into an 8-way hierarchical lru algorithm. a total of ten administration bits are required: six to maintain the four-way lru plus four bits maintain the four two-way lrus. the hierarchical algorithm has performance close to full eight-way lru, but it requir es far fewer bits?ten instead of 28 bits?and is much simpler to implement. to update the lru bits on a cache hit to element j (with 0 <= j <= 7), the cache applies m = (j div 2) to the four- way lru administration and (j mod 2) is applied to the two-way administration of pair m. to select a replace- ment victim, the cache first de termines the pair p from the four-way lru and then retrieves the lru bit q of pair p. the overall lru element is the p 2+q. 5.6.6 lru initialization reset causes the lru administ ration bits to initialized to a legal state: r[1,0] r[2,0] r[3,0] 1 r[2,1] r[3,1] r[3,2] 0 2_way[3] 2_way[2] 2_way[1] 2_way[0] 0 5.6.7 lru bit definitions the ten lru bits per set are mapped as shown in figure 5-13 . this is the format of the lru field as re- turned by the special operation rdstatus for the data cache and a ld32 from mmio space (see section 5.4.8, ?reading tags and cache status? ) for the instruction cache. 5.6.8 lru for the dual-ported cache for the pnx1300 dual-ported data cache, two memory operations to the same set are possible in a single clock cycle. to support this conc urrency, two updates of the lru bits of a single set must be possible. the following rules are used by pnx1300: 1. lru bits that are changed by exactly one port receive the value according to the algorithm described above. 2. lru bits that are changed by both ports receive a val- ue as if the algorithm were first applied for the access in port zero and then for the access in port one. 5.7 performance evaluation support the caches implement support for performance evalua- tion. several events that occur in the caches can be counted using the pnx1300 ti mer/counters, by selecting the source cache1 and/or cache2, as described in section 3.8, ?timers.? two different events can be tracked simultaneously by using 2 timers. the mmio register mem_even ts determines which events are counted. see figure 5-14 for the format of mem_events. table 5-14 lists the events that can be tracked and the corresponding values for the mem_events fields. event1 selects the actual source lru bit 0 r[3,1] r[3,0] r[3,2] r[2,0] r[1,0] r[2,1] 2_way[1] 2_way[0] 2_way[3] 2_way[2] lru bit 1 lru bit 2 lru bit 3 lru bit 4 lru bit 5 lru bit 6 lru bit 7 lru bit 8 lru bit 9 figure 5-13. lru bit definitions; 2_way[k] is the two- way lru bit of pair k = (j div 2) for set element j. 31 0 3 7 11 15 19 23 27 mem_events (r/w) 0x10 000c 0 event2 mmio_base offset: 000 00000000000000000000 event1 figure 5-14. format of the memory_events mmio register.
philips semiconductors cache architecture preliminary specification 5-13 for the timer cache1 source. event2 selects the source for timer cache2. if the memory bus is available: ? on read data cache miss the minimum waiting time is 12 sdram clock cycles, if critical word first is granted by the main memory interface (mmi). if not, then data cache waits from 12 to 18 sdram cycles (16 sdram cycles are required to fetch 64 bytes from sdram. ? on write data cache miss, the missing line needs to be fetched, thus it imp lies the same sdram cycles as a read data cache miss. if the victimized cache line is dirty, the cache line is copied back to memory after the read of the missing line is done and thus does not add extra stall cycles. ? prefetch delay is the same as read data cache if memory bus is available. as a reminder the prefetch may be discarded if the data cache state machine is ?full?, and there is a 3 stall cycle penalty when the prefetch is issued. 5.8 mmio register summary table 5-15 lists the mmio register s that pertain to the op- eration of pnx1300?s instruction and data caches. table 5-14. trackable cache-performance events encoding event 0 no event counted 1 instruction-cache misses 2 instruction-cache stal l cycles (including data- cache stall cycles if both instruction-cache and data-cache are stalled simultaneously) 3 data-cache bank conflicts 4 data-cache read misses 5 data-cache write misses 6 data-cache stall cycles (that are not also instruc- tion-cache stall cycles) 7 data-cache copyback to sdram 8 copyback buffer full 9 data-cache write miss with all fetch units occu- pied 10 data cache stream miss 11 prefetch operation st arted and not discarded 12 prefetch operation discar ded (because it hits in the cache or there is no fetch unit available) 13 prefetch operation discar ded (because it hits in the cache) 14?15 reserved table 5-15. mmio register summary name description dram_base sets location of the dram aperture dram_limit sets size of the dram aperture dram_cacheable _limit divides dram aperture into cache- able and non-cacheable portions mem_events selects which two events will be counted by timer/counters dc_lock_ctl data-cache locking enable and aper- ture control dc_lock_addr sets low address of the data-cache address lock aperture dc_lock_size sets size of the data-cache address lock aperture dc_params read-only regist er with data-cache parameter information ic_params read-only register with instruction- cache parameter information ic_lock_ctl instruction-cache locking enable ic_lock_addr sets low address of the instruction- cache address lock aperture ic_lock_size sets size of the instruction-cache address lock aperture mmio_base sets location of the mmio aperture
pnx1300/01/02/11 data book philips semiconductors 5-14 preliminary specification
preliminary specification 6-1 video in chapter 6 by gert slavenburg 6.1 video in overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the video in (vi) unit provides the following functions: ? digital video input from a digital camera or analog camera (using a video decoder). ? high-bandwidth (81 mb/sec) raw input data channel. ? direct 8-10 bit interface for video a/d converters at up to 81-mhz sample rate. ? receiver port for pnx1300-to-pnx1300 unidirec- tional message passing the vi unit operates in one of the modes per table 6-1 . digital video input is in yuv 4:2:2 with 8-bit resolution multiplexed in ccir656 format 1 from a digital camera or ccir656-capable video dec oder (such as the philips saa7111 or saa7113), across an 8-bit-wide interface. resolutions up to ccir601 are accepted at 50 or 60 fields per second. a programmable rectangular image is captured from a video frame and written in planar format to pnx1300 sdram. the video camera or decoder can be programmed using the pnx1300 i 2 c bus. in fullres capture mode, luminance (y) and chrominance (u, v) pass unmodified. in halfres capture mode, luminance and chrominance are horizontally decimated by a factor of two to convert to cif-lik e resolution with yuv 4:2:2 or mpeg sampling rules. if vert ical subsampling on chromi- nance is desired, it can be per formed by software on the dspcpu or by the on-chip image coprocessor (icp). when operating as raw input data channel, vi accepts 8- bit-wide data. the operation mode is raw8 capture. no data selection or data interpre tation is done. data is writ- ten in packed form, four bytes to a word, to local sdram. there is no hardware control over the rate at which the source sends data. instead, vi maintains two pointer/ counter registers to ensure that no data is lost when the local sdram memory buffer fills . data is accepted at the clock of the sender. if desired, vi_clk can be pro- grammed as an output to drive the data transfer at a pro- grammable rate. vi can accept raw data from up to 10-bit a/d converters, at sampling rates up to 81 mhz. vi can operate in raw8, raw10u, or raw10s capture mode for eight-bit, unsigned 10-bit or signed 10-bit data. in the 10-bit modes, data is zero- or sign-extended to 16 bits and stored in packed form in local sdram. as with the raw8-capture mode, vi maintains two pointer/counter registers to ensure that no data is lost when the loca l sdram memory buffer fills. data is accepted at the ex ternally set sa mpling rate. if desired, vi_clk can be programmed as an output to serve as a programmable sampling clock. vi can act as receiver from the enhanced video out (evo) unit of another pn x1300. one evo unit can broadcast to multiple re ceiving vis. in this message passing mode, no data selection or data interpretation is done. each message of the sender is written as byte- packed data to a separate local sdram me mory buffer. message start and end is indicated by the sender. the receiving vi will accept data until the sender indicates message end or until the current memory buffer is full. if the memory buffer fills be fore message end is encoun- tered, the received data is truncated and an error condi- tion is raised. 6.1.1 interface besides the vi-specific pins in table 6-2 , the pnx1300 i 2 c interface is typically used to control the external cam- era or video decoder. figure 6-1 through figure 6-4 illustrate typical connec- tions for commonly used external sources. note that vi_dvalid is only used in special circumstances, e.g. when sending data through a channel that results in clock periods both with and without data transfers. table 6-1. vi unit mode selection. mode function explanation 0000 fullres capture yuv 4:2:2 capture, no decimation 0001 halfres capture yuv 4:2: 2 capture, decimate by 2 0010 raw8 capture raw 8-bit data capture, pack 4 bytes to a word 0011 raw10s capture raw 10- bit data capture, sign extend to 16 bits, pack 2 to a word 0100 raw10u capture raw 10-bi t data capture, zero- extend to 16 bits, pack 2 to a word 0101 message passing message reception from evo 0110 .. 1111 reserved 1. refer to ccir recommendation 656: interfaces for dig- ital component video signal s in 525-line and 625-line television systems. recommendation 656 is included in the philips desktop video data handbook.
pnx1300/01/02/11 data book philips semiconductors 6-2 preliminary specification 6.1.2 diagnostic mode the vi logic can be set to operate in diagnostic mode, which connects the inputs of vi to the outputs of the evo unit. this mode provides boot diagnostics with the ability to verify major operational aspects of the chip before handing control to an operating system. diagnostic mode is entered by writing a control word with a ?1? in the diagmode bit position to the vi_ctl register (see figure 6-11 ). the evo unit has to be setup to pro- vide a clock before starting diagmode. after a vi soft- ware reset, the diagmode bit has to be set back to ?1?. in diagnostic mode, the vi signals are exactly as shown in figure 6-2 , except that the inputs come from the on- chip evo unit. note that th e inputs are truly taken from the pnx1300 evo external pins , i.e. if an external (board level) source is driving evo pins, diagnostic mode is not capable of testing the evo unit. note that the diagnostic mo de only controls an input mul- tiplexer. vi can be programmed and operated in all usual modes. the raw modes are particularly attractive for di- agnostics purposes, since they allow vi to operate al- most as an on-chip logic analyzer. 6.1.3 power down and sleepless the vi unit enters power down state whenever pnx1300 is put in global power down mode, except if the sleep- less bit in vi_ctl is set. in the latter case, the block continues dma operation and will wake up the dspcpu whenever an interrupt is generated. the evo block can be separately powered down by set- ting a bit in the block_power_down register. refer to chapter 21, ?power management.? it is recommended that the evo unit be stopped (by ne- gating vi_ctl.capture_enabl e) before block-level power down is started, or that sleepless mode be used when global power down is activated. 6.1.4 hardware and software reset video in is reset by a pnx1300 hardware reset (pin tri_reset#) or by a vi software reset. the latter is ac- complished by writing a control word of 0x00080000 to the vi_ctl register. after a software reset, allow for 5 video clock cycles delay before enabling vi capture. upon hardware or software reset, the vi_ctl, vi_status, and vi_clock registers are set to all ?0?s. the state of the other regi sters after reset is unde- table 6-2. vi unit interface pins vi_clk i/o-5 ? if configur ed as input (power up default): a positive transition on this incoming video clock pin samples all other vi_dat a input signals below if vi_dvalid is high. if vi_dvalid is low, vi_data is ignored. clock and data rates of up to 81 mhz are supported. pnx1300 supports an additional mode where vi_data[9:8] in message passing mode are not affected by the vi_dvalid signal, section 6.6.1 . ? if configured as output: programma- ble output clock to drive an external video a/d converter. can be pro- grammed to emit integral dividers of dspcpu_clk. ? see section 6.2 for clock program- ming details. vi_dvalid in-5 vi_dvalid indicates that valid data is present on the vi_data lines. if high, vi_data will be accepted on the next vi_clk positive edge. if low, no vi_data will be sampled. pnx1300 supports an additional mode where vi_data[9:8] in message pass- ing mode are not affected by the vi_dvalid signal, section 6.6.1 . vi_data[7:0] in-5 ccir656 style yuv 4:2:2 data from a digital camera, or general purpose high speed data input pins. sampled on positive transitions of vi_clk if vi_dvalid high. vi_data[9:8] in-5 extension high speed data input bits to allow use of 10-bit video a/d convert- ers in raw10 modes. vi_data[8] serves as start and vi_data[9] as end message input in message pass- ing mode. sampled on positive transi- tions of vi_clk if vi_dvalid high. pnx1300 supports an additional mode where vi_data[9:8] in message pass- ing mode are not affected by the vi_dvalid signal, section 6.6.1 .
philips semiconductors video in preliminary specification 6-3 fined. note that the vi clock has to be present while ap- plying the software reset. data[7:0] clock sda, scl gnd cable connector vi_data[7:0] vi_dvalid vi_clk vss sda, scl pnx1300 logic ?1? vi_data[9:8] gnd termination & receivers i 2 c bus 2 figure 6-1. vi connected to an 8-bit ccir656 digital camera. vi_data[7:0] vi_dvalid vi_clk pnx1300 2 logic ?1? vi_data[8] vi_data[9] vo_data[7:0] vo_clk (stmsg) vo_io1 (endmsg) vo_io2 pnx1300 1 figure 6-2. vi unit connected to an evo unit of another pnx1300. vi_data[7:0] vi_dvalid vi_clk iic_scl iic_sda pnx1300 logic ?1? vi_data[9:8] gnd vpo[15:8] llc scl sda saa7111 analog video 1?2 s-vhs y/c 1?4 cvbs to other i 2 c devices i 2 c bus 24.576 mhz figure 6-3. vi unit connected to a video decoder.
pnx1300/01/02/11 data book philips semiconductors 6-4 preliminary specification 6.2 clock generator the vi block can operate in two distinct clocking modes, as controlled by the vi_clo ck control register (see figure 6-11 ). selfclock = 0: ?external clocking mode?. this is the most common mode of operation. in this mode, the vi_clk pin is an asynchronous clock input. all other in- puts are sampled on positive edges of the vi_clk clock signal. on-chip synchronizers ensure reliable asynchro- nous capture. this mode can be combined with diag- mode, in which case the evo clock acts as the asyn- chronous clock source. in external clocking mode, the value of divider is ignored. selfclock = 1: ?internal clocking mode?. this mode is typically intended fo r use with external a/d con- verters or other sources that require a clock. in this mode, vi_clk is an output pin. positive edges of vi_clk are used to sample all other inputs. the gener- ated clock frequency can be programmed using the di- vider field in the vi_clock register. on reset, vi_clock is set to zero, i.e. external clock- ing mode is the defaul t with divider ignored. 6.3 fullres capture mode in fullres capture mode, the vi unit receives all three vid- eo components y, u, and v, as well as synchronization information (sav and eav co des) on the vi_data[7:0] pins in ccir656 format. see figure 6-8 . the three video components y, u, and v are separated into three differ- ent streams. each component is written in packed form into separate y, u, and v bu ffers in the sdram. this is commonly called a planar format 1 (see figure 6-10 ). the ccir656 standard specifie s that the camera has to obey the sampling rules illustrated in figure 6-5 . vi is ca- pable of chrominance resampling, and can produce sam- ples in memory in two ways: vi_ctl.sc=0. ?co-sited sampling? places luminance and chrominance samples in memory without any modi- fication. hence, a planar fo rmat results with sampling po- sitions as per co-sited luminance and chrominance yuv 4:2:2 convention. vi_data[9:0] vi_dvalid vi_clk pnx1300 logic ?1? analog video 10-bit video a/d figure 6-4. vi connected to a 10-bit video a/d converter. f viclk f dspcpu divider ----------------------- - = 1. the planar format is most suitable as input to software compression algorithms. chrominance (u,v) samples luminance samples figure 6-5. camera yuv 4:2:2 sampli ng (co-sited luminance/chrominance).
philips semiconductors video in preliminary specification 6-5 vi_ctl.sc=1: ?inter spersed sampling? serves to gen- erate a sampling structure in memory where chromi- nance samples are spatially midway between luminance samples, as shown in figure 6-6 . this ?interspersed? for- mat is suitable for use in mpeg-1 encoding. the vi hardware applies a (?1 13 5 ?1)/16 filter as illus- trated in figure 6-6 to the chrominance samples before writing them to memory. this filter computes chromi- nance values at sample points midway between lumi- nance samples 1 . computed video data is clamped to 01h if the filter result is less than 01h and clamped to ffh if greater than ffh. interspersed data format is preferred by some video compression standards. the mpeg-1 standard, for example, requires yuv 4:2:0 data with chrominance sampling positi ons horizontally and verti- cally midway between luminance samples. this can be achieved from the horizonta lly interspersed sampling for- yuv 4:2:2 ccir656 input samples abcde f gh i j k l abcde f gh i j k l resampled sample values y g ' y g = u ef u ? c 13 u e 5 u g u i ? ++ () 16 ? = v ef v c ? 13 v e 5 v g v i ? ++ () 16 ? = figure 6-6. chrominance re-sampling to achieve interspersed sampling. active area abcde f gh i j dcb zu zv zw zx zy zz zy zx zw zs zt ? ? ? figure 6-7. filtering at the edge of the active area. preamble 11111111 00000000 00000000 1fvh pppp timing reference code protection bits (error correction) h = 0 for sav h = 1 for eav v = 1 during field blanking v = 0 elsewhere f = 0 during field 1 f = 1 during field 2 figure 6-8. format of ccir656 sav and eav timing reference codes. captured image start_x width height start_y pixel 0 pixel m?1 line 0 line n?1 figure 6-9. vi cap ture parameters. 1. all filters perform full pr ecision intermediate computa- tions and saturation upon gener ating the result bits.
pnx1300/01/02/11 data book philips semiconductors 6-6 preliminary specification mat by vertical subsampling with a (1 1) / 2 or more so- phisticated filter. vertical filtering can be performed in software using the dspcpu?s efficient multimedia oper- ations or by hardware in the on-chip icp. the filtering process exercise s special care at the left and right edges of the active area of the ccir656 data stream, as defined by the sav, eav code positions. see figure 6-7 . since no pixels exist to the left of the first pix- el or to the right of the last pi xel, filtering can result in ar- tifacts. to minimize artifa cts, the image is extended by mirroring pixels around the le ft-most and right-most pixel. note that the image is mirrored around pixel ?a?, the first pixel after the sav code and around pixel ? zz?, the last pixel before the eav 1 code. pixel ?a? in figure 6-7 is the (chroma, luma) pair defined by the first three camera bytes of the uyvyuyvy... stream after sav. refer to figure 6-11 for an overview of the memory mapped i/o (mmio) registers that are used to control and observe the operation of vi in fullres capture mode. to ensure compatib ility with future devices, any unde- fined mmio bits should be ignored when read and written as?0?s. upon hardware or software reset ( section 6.1.4, ?hard- ware and software reset? ), the vi_ctl, vi_status, and vi_clock registers are set to all zeros. at any point in time, the vi _status register fields (see figure 6-11 ) indicate the current camera status: ? cur_x: the pixel index (0 to m?1) of the most recently received camera pixel. cur_x gets set to zero for the first pixel following receipt of a sav code 2 , and incremented on every valid y sample received thereafter. ? cur_y: the line index (0 to n?1) within the current field of the camera line that is currently being received. cur_y gets set to zero upon receipt of a negative edge of v, i.e., up on the first sav code con- taining v=0 after one or more sav codes containing v=1. this is equivalent to the first line after the end of vertical retrace. cur_y gets incremented upon every successive sav code. ? field2: indicates whether the field currently being received is a field1 or 2. this flag gets updated based on the f field of every received sav code. note that field1 is the ?top? field, i. e. the field containing the top- most visible line. field1 contains lines 1,3,5 etc. field2 contains li nes 2,4,6,8 etc. table 6-3 illustrates common digital camera standards and the number of active pixels per line, lines per field, and fields per second. note that any source is accept- able to vi, as long as the maximum vi_clk rate is not exceeded. figure 6-9 shows the details of an incoming field and the captured image. the incoming field consists of n hori- zontal lines, each line having m pixels labeled 0 through m?1. lines are numbered from 0 through n?1. the cap- tured image is a subset of the incoming image. it is de- fined by the capture parameters (start_x, start_y, width, height) held in the vi_cap_start and vi_cap_size mmio registers (see figure 6-11 ). ? start_x: defines the starting pixel number (x-coor- dinate of the starting pixel). start_x must be even, and greater than or equal to ?0?. ? start_y: defines the starting line number (y-coor- dinate of the starting pixel). start_y must be greater than or equal to ?0?. ? width: defines the width of the captured image in pixels. width must be even. ? height: defines the height of the captured image in lines. image capture starts after the following conditions are met: ? vi_ctl.capture enable is asserted. ? vi_status.capture complete is de-asserted, indicating that any prev iously captured image has been acknowledged. ? cur_y = start_y occurs. once image capture is star ted, height ?lines? are cap- tured. each line capture starts if: ? the previous line capture, if any, is completed. ? cur_x = start_x once line capture starts, it continues for 2*width pixel clocks 3 in which vi_dvalid is asserted, irrespective of the presence of one or more eav codes. note that capture continues regardless of any horizontal or vertical retrace and associated cur_y or cur_x re- set. this provides special applications with the ability to capture information embedded inside the horizontal or vertical blanking interval. if it is desirable to capture pix- els in the horizontal blanking interval, a minimum time separation of 1 s is required between the last pixel cap- tured on line y and the first pixel captured on line y+1. an exception to this rule is allowed if and only if the storage parameters below are chosen such that the last and first 1. eav codes with multiple bit errors are accepted and en- able the mirroring function. 2. note that vi uses the sav protection bits to implement single error correction and double error detection. an sav code with double error is ignored. table 6-3. common video source parameters. video source m (# active pixels) n (# active lines) field rate (hz) ccir601 50 hz/625 lines 720 288 50 ccir601 60 hz/525 lines 720 240 60 square pixel 50 hz/625 lines 768 288 50 square pixel 60 hz/525 lines 640 240 60 3. four clocks for each c b ,y,c r ,y group representing two luminance pixels
philips semiconductors video in preliminary specification 6-7 pixel end up in adjacent memory locations. note that blanking information capture only makes sense in fullres mode with co-sited sampling. all other modes apply filter- ing, which will distort the numeric sample values. the captured image is stored in sdram at a location de- fined by the storage parameters in mmio registers (y_base_adr, y_delta, u_base_adr, u_delta, v_base_adr, v_delta). note that the base-address registers force alignment to 64-byte boundaries (six lsbs are always zero). the default memory packing is big-endian although little-endian packing is also support- ed by setting the little_endian bit in the vi_ctl reg- ister. ? y_base_adr: the desired starting (b yte) address in sdram memory where the first y (luminance) sample of the ca ptured image will be stored. this address is forced to be 64-byte aligned (six lsbs always ?0?). ? y_delta: the desired address difference between the last sample of a line and the address of the first sample on the next line. note that the value of y_delta must be chosen so that all line-start addresses are 64-byte aligned. ? u_base_adr, u_delta, v_base_adr, v_delta: same functions and alignment restric- tions as above, but for chrominance-component samples. horizontally-adjacent sample s are stored at successive byte addresses, resulting in a packed form (four 8-bit samples are packed into one 32-bit word). upon horizon- tal retrace, pixel storage addresses are incremented by the corresponding delta to compute the starting byte address for the next line. note that delta is a 16-bit un- signed quantity. this process continues until height lines of width samples have been stored in memory for luminance (y). for chrominance, height lines of half the width are stored 1 . see figure 6-10 . modifications to y_base _adr, u_base_adr and v_base_adr have no effect unt il the start of next cap- ture, i.e. vi hardware ma intains a separate pointer to track the current address. modifications to y_delta, u_delta and v_delta do affect the next horizontal re- trace. hence, under normal circumstances, the delta variables should not be changed during capture. when capture is complete, i.e. any internal vi buffers have been flushed and the entire captured image is in lo- cal sdram, vi raises the status register flag cap- ture complete. if enabled in the vi_ctl register, this event causes a dspcpu interrupt to be requested. the programmer can determine whether the captured image is a field1 or field2 by inspection of the field2 flag in vi_status. note that the field2 flag changes at the start of the vertical blanking interval of the next field. the capture complete flag is cleared by writing a word to vi_ctl with a ?1? in the capture complete ack bit position. this action has the following effect: ? it tells the hardware that a new y,u, and v dma buffer is available (or the old one has been copied) ? it clears the capture complete flag ? it tells vi to capture the next image the user can program the y_threshold field to gen- erate pre-completion (or post-completion) interrupts. whenever cur_y reaches y_threshold, the threshold reached flag in the status register is set. if enabled in the vi_ctl register, this event causes a dspcpu interrupt request. the threshold reached flag is cleared by writing a word to vi_ctl with a ?1? in the threshold reached ack bit posi- tion. note that, due to internal buffering in the vi unit, it is not guaranteed that all samples from lines up to and in- 1. note that consecutive pixel components of each line are stored in consecutive memory addresses but con- secutive lines need not be in consecutive memory ad- dresses width pixels height lines pix0 pix1 pix2 pix w?1 ? ? ? . . . y_base_adr width/2 pixels height lines pix0 pix2 ? ? ? . . . u_base_adr (repeated for v_base_addr, v_delta) y_delta u_delta figure 6-10. vi yuv 4:2:2 planar memory format.
pnx1300/01/02/11 data book philips semiconductors 6-8 preliminary specification cluding cur_y have been written to local sdram upon threshold reached. the implementation guaran- tees a fixed maximum time of 2 s between raising the interrupt and completion of all writes to sdram. the threshold interrupt mech anism works regardless of capture enable. hence, it can also be used to skip a desired number of fields without constant dspcpu polling of vi_status. if vi internal buffers overfl ow due to insufficient internal data-highway bandwidth allocation, the highway bandwidth error condition is raised in the vi_status register. if enabled , this causes assertion of a vi interrupt request. capture continues at the correct memory address as soon as the internal buffers can be written to memory, but one or more pixels may have been lost, and the corresponding memory locations are not written. the hbe condition can be cleared by writing a ?1? to the highway ban dwidth error ack bit in vi_ctl. refer to section 6.7, ?highway latency and hbe? for more information. any interrupt event of vi (capture complete, threshold reached, highway bandwidth er- ror) leads to the assertion of a single vi interrupt (source 9) to the pnx1300 vectored interrupt control- ler. the interrupt handler routine should check the sta- tus register to determine the set of vi events associated with the request. the vectored interrupt controller should always be set to have vi (s ource 9) operate in level sensitive mode. this ensures that each event is handled. vi asserts the interrupt request line as long as one or more enabled events are as serted. the interrupt handler clears one or more selected events by writing a ?1? to the corresponding ack field in vi_ctl. the clearing of the last event leads to immediate (next dspcpu clock edge) de-assertion of the interrupt request line to the vectored interrupt controller. see section 3.5.3, ?int and nmi (maskable and non-maskable interrupts),? for informa- tion on how to program interrupt handler routines. vi_status (r) 0x10 1400 31 0 mmio_base offset: vi_clock (r/w) 0x10 1408 vi_cap_start (r/w) 0x10 140c vi_cap_size (r/w) 0x10 1410 cur_y(12) 3 7 11 15 19 23 27 divider start_y width cur_x(12) field2 threshold reached capture complete vi_ctl (r/w) 0x10 1404 y_threshold mode capture complete int enable threshold reached ack (write ?1? to ack) capture complete ack threshold reached int enable sc (sampling conventions) 0 ? co-sited 1 ? interspersed little endian capture enable software reset diagmode selfclock start_x height vi_y_base_adr (r/w) 0x10 1414 y_base_adr vi_u_base_adr (r/w) 0x10 1418 u_base_adr vi_v_base_adr (r/w) 0x10 141c v_base_adr vi_uv_delta (r/w) 0x10 1420 u_delta(16) vi_y_delta (r/w) 0x10 1424 y_delta(16) v_delta(16) hbe (highway bandwidth error) hbe int enable highway bandwidth error ack sleepless 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 reserved figure 6-11. yuv capture view of vi mmio registers.
philips semiconductors video in preliminary specification 6-9 6.4 halfres capture mode halfres capture mode is identical in operation to fullres capture mode except that hor izontal resolution is re- duced by a factor of two on both luminance and chromi- nance data. referring to figure 6-9 and figure 6-11 , if vi is pro- grammed to capture height lines of width pixels in width/2 pixels height lines pix0 pix1 pix2 pix w/2?1 ? ? ? . . . y_base_adr width/4 pixels height lines pix0 pix2 ? ? ? . . . u_base_adr (repeated for v_base_addr, v_delta) y_delta u_delta figure 6-12. vi halfres planar memory format. yuv 4:2:2 ccir656 input samples abcde f gh i j k l halfres capture sample results u f ' 3 u c ? 19 u e 19 u g 3 u i ? ++ () 32 ? = v f ' 3 v c ? 19 v e 19 v g 3 v i ? ++ () 32 ? = y h ' 3 y e ? 19 y g 32 y h 19 y i 3 y k ? +++ () 64 ? = figure 6-13. halfres co-sited sample capture. yuv 4:2:2 ccir656 input samples abcde f gh i j k l halfres capture sample results y g ' 3 y d ? 19 y f 32 y g 19 y h 3 y j ? ++ + () 64 ? = u f ' 3 u c ? 19 u e 19 u g 3 u i ? ++ () 32 ? = v f ' 3 v c ? 19 v e 19 v g 3 v i ? ++ () 32 ? = figure 6-14. halfres inters persed sample capture.
pnx1300/01/02/11 data book philips semiconductors 6-10 preliminary specification halfres mode, the resulting captured planar data is as shown in figure 6-12 . note that width/2 luminance and width/4 chrominance samples are captured. in this mode, start_x and width must be a multiple of four. horizontal-resolution reduction is performed as shown in figure 6-13 or figure 6-14 . the spatial sampling con- ventions of the pixels in memory depends on the sc (sampling convention) bit in the vi_ctl register. assum- ing that the camera sampling positions obey the conven- tions shown in figure 6-5 , two possible spatial formats are supported in memory: ? if sc=0, co-sited luminance and chrominance sam- ples result as shown in figure 6-13 . this corre- sponds to the standard yuv 4:2:2 sampling conventions. ? if sc=1, interspersed chrominance samples result, as shown in figure 6-14 . this form is (after vertical subsampling of the chroma components) identical to the mpeg-1 sampling conventions. if vertical sub- sampling is desired, it c an either be performed in software on the dspcpu or in hardware by the icp. the filtering process applies mirroring at the edge of the active video area, as per figure 6-7 . for both filters, computed video data is clamped to 01h if result of the filter is less than 01h and clamped to ffh if greater than ffh. 6.5 raw capture modes all raw capture modes (raw8, raw10s and raw10u) be- have similarly. vi_data information is captured at the rate of the sender?s clock, without any interpretation or start/stop of capture on the basis of the data values. any clock cycle in which vi_dvalid is asserted leads to the capture of one data sample. samples are 8 or 10 bits long (raw8 versus raw10 modes). for the 8-bit capture mode, four samples are packed to a word. for the 10-bit capture modes, two 16-bit samples are packed to a word. the extension from 10 to 16 bits uses sign exten- sion (raw10s) or zero extension (raw10u). for 8-bit and 16-bit capture, successive captured values are written to increasing memory addresses. for 16-bit capture, the byte order with wh ich the 16-bit data is writ- ten to memory is governed by the little endian bit. the vi little endian bit should be set the same as the dspcpu endianness (pcs w.bsx). this ensures that the dspcpu sees correct 16-bit data. figure 6-15 illustrates the ?raw-mode? view of the vi mmio registers. figure 6-16 shows the major vi states associated with raw-mode ca pture. the initial state is reached on software or hardware reset as described in section 6.1.4, ?hardware and software reset? . upon re- set, all status and control bits are set to ?0?. in particular, capture_enable is set to ?0? and no capture takes place. once the software has programmed base1 and base2 (with the start addresses of two sdram buffer areas 1 ) 2 1 vi_status (r) 0x10 1400 31 0 mmio_base offset: vi_clock (r/w) 0x10 1408 vi_base1 (r/w) 0x10 1414 vi_base2 (r/w) 0x10 1418 3 7 11 15 19 23 27 divider buf1active buf2full buf1full vi_ctl (r/w) 0x10 1404 mode buf1full ack2 ack1 buf2full little endian capture enable software reset diagmode selfclock base1 base2 vi_size (r/w) 0x10 141c size (in samples) overflow (message mode only) overrun ack_ovf ack_ovr ovf ovr interrupt enables highway bandwidth error highway bandwidth error int enable highway bandwidth error ack sleepless 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 reserved 31 15 19 23 27 valid figure 6-15. raw and message passing modes view of vi mmio registers.
philips semiconductors video in preliminary specification 6-11 and size (in number of samples), it is safe to enable cap- ture by setting capture_enabl e. note that size is in samples and must be a multiple of 64, hence setting a minimum buffer size of 64 bytes for raw8 mode and 128 bytes for raw10 modes. at this point, buffer1 is the active capture buffer. data is captured in buffer1 until capture is disabled or until size samples have been captured. after every sample, a running address pointer is incremented by the sample size (one or two bytes). if size samples have been captured, capture continues (without missing a sample) in buffer2. at the same time, buf1full is as- serted. this causes an interrupt on the dspcpu, if en- abled by buf1full interrupt enable. buffer2 is now the active c apture buffer and behaves as described above. in normal operation, the dspcpu will respond to the buf1full event by assigning a new base1 and (optionally) size and performing an ack1. if the dspcpu fails to assign a new buffer1 and per- forms an ack1 before buffer 2 also fills up, the over- run condition is raised and capture stops. capture con- tinues upon receipt of an ack1, ack2, or both, regardless of the overrun state. the buffer in which capture resumes is as indicated in figure 6-16 . the overrun condition is ?sticky ? and can only be cleared by software, by writing a ?1 ? to the ack_ovr bit in the vi_ctl register. if insufficient bandwidth is allocated from the internal data highway, the vi internal buffers may overflow. this leads to assertion of the highway bandwidth er- ror condition. one or more data samples are lost. cap- ture resumes at the correct memory address as soon as the internal buffer is writte n to memory. the hbe error condition is sticky. it remains asserted until it is cleared by writing a ?1? to highway bandwidth error ack. refer to section 6.7, ?highway latency and hbe.? note that vi hard ware uses copies of the base and size registers once capture has started. modifications of base or size, theref ore, have no effect until the start of the next use of the corresponding buffer. note also that the vi_ base1 and vi_ base2 addresses must be 64-byte aligned (the six lsbs are always ?0?). 6.6 message-passing mode in this mode, vi receives 8-bit message data over the vi_data[7:0] pins. the message data is written in packed form (four 8-bit message bytes per 32-bit word) to sdram. message data capt ure starts on receipt of a start event on vi_data[8]. message data is received until endofmessage (eom) is received on vi_data[9] or the receive buffer is full. note that the vi_size mmio register determines the buffer size, and hence maximum message length. it should not be changed without a vi (soft) reset. figure 6-17 illustrates an example of an 8-byte message transfer. the first byte (d0) is sampled on the rising edge of the vi_clk clock after a valid start was sampled on the preceding rising clock edge. the last byte (d7) is 1. sdram buffers must start on a 64-byte boundary. active = buf2 buf1full active = buf1 active = buf2 active = buf1 buf2full buf1full buf2full raise overrun* * overrun is a sticky flag. it is set but does not af- fect operation. it can only be cleared by software, by writing a ?1? to ack_ovr. (see text in section 6.5 ) a c k 1 & ~ a c k 2 a c k 1 & a c k 2 ~ a c k 1 & a c k 2 b u f f e r 2 f u l l b u f f e r 1 f u l l buffer1 full ack1 buffer2 full ack2 reset figure 6-16. vi raw mode major states.
pnx1300/01/02/11 data book philips semiconductors 6-12 preliminary specification sampled on the rising clock edge where eom is sampled asserted. the message passing mode view of the vi mmio regis- ters is shown in figure 6-15 . the major states are shown in figure 6-18 . the operation is almost identical to the operation in raw-capture mode, except that transitions to another active buffer occur upon receipt of eom rather than on buffer full. over run is raised if the second buffer receives a complete message before a new buffer is assigned by the dspcpu. overflow is raised if a buffer is full and no eom has been received. if enabled, it causes a dspcpu interrupt. since digital interconnection between devices is reliable, overflow is indicative of a protocol error between the two pnx1300s involved in the exchange (failure to agree on message size). detection of overflow leads to total halt of capture of this message. capture resumes in the next buffer upon receipt of the next start event on vi_data[8]. the overflow flag is sticky and can only be cleared by writing a ?1? to ack_ovf. highway bandwidth error behavior in message passing mode is identical to that of raw mode. 6.6.1 vi_dvalid in message passing mode pnx1300 offers a new mode where the vi_dvalid pin does not control the sampling of the vi_data[9:8] pins. these pins are used for end and start of a message. this new mode is controlled by a new field, valid, in the vi_clock mmio register. th e default value after re- set is ?0?. when vi_clock.valid is se t to ?0? (the reset value) then pnx1300 behaves as in tm-1300. in this case the start and end of messages are sampled only if the vi_dvalid pin is high. when vi_clock.valid is set to ?1? then pnx1300 acti- vates the new behavior. in this case the start and end of messages are always sampled independently of the state of the vi_dvalid pin. vi_clock.valid cannot be re ad back, therefore it al- ways read 0. vi_data[7:0] vi_data[8] vi_data[9] vi_clk xx d0 d1 d2 d3 d4 d5 d6 d7 xx xx start of message end of message figure 6-17. vi message passing signal example. active = buf2 buf1full active = buf1 active = buf2 active = buf1 buf2full buf1full buf2full raise overrun* * overrun and overflow are sticky flags. they are set, but do not affect operation. they can only be cleared by soft- ware, by writing a ?1? to ack_ovr or ack_ovf. (see text in section 6.6 ) a c k 1 & ~ a c k 2 a c k 1 & a c k 2 ~ a c k 1 & a c k 2 e o m e o m eom ack1 eom ack2 reset no eom ? raise overflow* ( see text in section 6.6 ) no eom ? raise overflow* ( see text in section 6.6 ) figure 6-18. vi message passing mode major states.
philips semiconductors video in preliminary specification 6-13 6.7 highway latency and hbe refer to chapter 20, ?arbiter,? for a description of the ar- biter terminology used here. the vi unit uses internal buffering before writing data to sdram. there are two internal buffers, each 16 entries of 32 bits. in fullres mode, each internal buffer is used for 128 y samples, 64 u samples, and 64 v samples. once the first internal buffer is filled, 4 hi ghway transactions must oc- cur before the second buffer fills completely. hence, the requirement for not losing samples is: ? 4 requests must be served within 256 vi clock cycles. for the typical ccir601-resolution ntsc or pal 27- mhz vi clock rate, the latency requirement is 4 requests in 9481 ns (25600/27). this can be used as one request every 2370 ns or, with a pnx1300 sdram clock speed of 100 mhz, every 237 sdra m clock cycles. the one re- quest latency is used to define the priority raising value (see section 20.6.3 on page 20-8 ). in halfres mode, the y, u, and v decimation by 2 takes place before writing to the internal buffers. so, the re- quirement for not loosing samples is: ? 4 requests served within 512 vi clock cycles. for halfres subsampling, ntsc or pal 27-mhz vi clock rate and pnx1300 sdram clock speed of 100 mhz, la- tency is 4 requests in 51200/27 = 18962 ns (1896 high- way clock cycles) or one request every 4740 ns (474 sdram clock cycles). for raw8 capture and message passing modes, each in- ternal buffer stores 64 samples at the incoming vi clock rate. the latency requirement is one request served ev- ery 64 vi clock cycles. for the raw10 capture modes, each internal buffer stores 32 samples. hence, the requirement for not losing sam- ples is one request served every 32 vi clock cycles. for a 38-mhz data rate on the incoming 10-bit samples and a pnx1300 sdram clock speed of 100 mhz, high- way latency should be set to guarantee less than 3200/ 38 = 842 ns (84 sdram clock cycles) per clock cycle. this cannot be met if any other peripherals are enabled. table 6-4 summarizes the maximum allowed highway la- tency (in sdram clock cycles) needed to guarantee that no samples are lost. the general formula uses ?f? to rep- resent the vi clock frequency (in mhz). in fullres mode, bandwidth requirements (in bytes) per video line with active image for vi is: ?b fullr = ceil(width*2/256) * 4 * 64 ceil(x) function is the least integral value greater than or equal to x. in halfres mode, the bandwidth is: ?b halfr = ceil(width*2/512) * 4 * 64 raw8 mode and message passing mode bandwidth de- pends only on vi clock speed. for raw10 mode each 10- bit value counts as 2 bytes for bandwidth computations. table 6-4. vi highway latency requirements (27-mhz data rate, 100-mhz pnx1300 highway clock) mode max latency setting (27 mhz, 100 mhz) formula fullres capture 237 6,400/f halfres capture 474 12,800/f raw8 237 6,400/f raw10s 118 3,200/f raw10u 118 3,200/f message passing 237 6,400/f
pnx1300/01/02/11 data book philips semiconductors 6-14 preliminary specification
preliminary specification 7-1 enhanced video out chapter 7 by marc duranton, dave wyland, gert slavenburg 7.1 enhanced video out summary in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 enhanced video out (evo) improves on the design of the tm-1000 video out (vo) unit while maintaining binary-compatib ility. pnx1300 evo is fully backward compatible with tm-1100, and has been ex- tended to support byte data rates up to 81-mhz and im- prove the genlock mode. the summary of new evo fea- tures versus tm-1000 includes: ? internal clock generator (dds) has reduced jitter ? full alpha blending supports 129-levels ? chroma keying ? frame synchronization can be internally or externally generated (genlock mode) ? external frame sync. follows the field number gener- ated in the eav/sav code ? programmable yuv output clipping ? data-valid signal generated in data-streaming mode ? in message passing mode, message length can range from one word (4 bytes) up to 16 mb. 7.2 about this document this chapter describes the pnx1300 evo unit which ex- tends and improves the desi gn of the tm-1000 vo unit, and consolidates the changes introduced in the tm- 1100. please refer to the tm-1000 databook for a de- scription of the vo unit?s functionality. 7.3 backward compatibility the evo is functionally compatible with the tm-1000 vo unit. all tm-1000 vo features are supported exactly in the same fashion by the pnx1300 evo. software written for the tm-1000 vo can control the pnx1300 evo with- out modification (with the exception of the genlock mode which now requires evo_ctl. genlock to be set to 1 in addition to vo_ctl. sync_master = 0). all new features (with respect to tm-1000) and improve- ments are selectively enabled by setting bits in the evo_ctl mmio register, described in section 7.16.4 . a method to determine the existence of evo registers is given in section 7.16.1 . the pnx1300 evo features are disabled on hardware reset in order to remain hardware-compatible with the tm-1000 vo. so it is assumed throughout this chapter that all new functions c ontrolled by evo_ctl are en- abled by software. any new software should use the new evo modes. 7.4 function summary the pnx1300 evo generates and transmits continuous digital video images. it can connect to an off-chip video subsystem such as a digital video encoder chip (e.g., the philips saa7125 denc digital en coder), a digital video recorder, or the video input of another pnx1300 through a ccir 656-compatible byte-parallel video interface. see figure 7-1 , figure 7-2 , and figure 7-3 . the evo can either supply video pixel clock and syn- chronization signals to the ex ternal interface or synchro- nize to signals received from the external interface (gen- lock mode). pal, ntsc, 16:9 and other video formats including dou- ble pixel-rate, non-interlaced video formats are support- ed through programmable registers which control pixel clock frequency and video field or frame format. the evo can combine a background video image from sdram with an optional foreground graphics overlay im- age from sdram using 129-level, per-pixel alpha blend- ing. the composite result is sent out as continuous vid- eo. video image data is taken from a planar memory format, with separate y, u and v planes in memory in yuv 4:2:2 or 4:2:0 format. the optional graphics overlay is taken from a pixel-packed yuv 4:2:2+ data structure in memory. the evo can also be used to stream continuous data (data-streaming mode) or send unidirectional messages (message-passing mode) from one pnx1300 to another. in data-streaming mode, the evo generates a continu- ous stream of arbitrary byte data using internal or exter- nal clocking. dual buffers a llow continuous data stream- ing in this mode by allowing the dspcpu to set up a buffer while another is being emptied by the evo. data- valid signals are generated on vo_io1 and vo_io2 to synchronize data streaming to other pnx1300 data re- ceivers. in message-passing mode, unidirectional messages can be sent to the video in (vi) port(s) of one or more pnx1300s. start and end-of-message signals are pro-
pnx1300/01/02/11 data book philips semiconductors 7-2 preliminary specification vided to synchronize message passing to other pnx1300 message receivers. 7.4.1 detailed feature descriptions the evo provides the following key functions. ? continuous digital video output of pal or ntsc for- mat data according to ccir 601. ? transmission of yuv 4:2:2 co-sited pixel data across a standard 8-bit parallel ccir 656 1 interface. embedded sav and eav synchronization codes and separate sync control sig nals compatible with philips denc encoders are available. ? supports the nominal pal/ntsc data rate of 27 mb/sec. (13. 5 mpix/sec.), or any byte data rate up to an 81-mhz evo clock. ? custom video formats can be programmed with frames or fields of up to 4095 lines of up to 4095 pix- els, subject only to the data rate limitation above. ? support for video images in planar yuv 4:2:2 co- sited, planar yuv 4:2:2 in terspersed, or planar yuv 4:2:0 memory formats. ? optional 129-level alpha blending. graphics overlay image is in pixel-packed yuv 4:2:2+ format, and is alpha blended on top of the video image. each pixel has a 1-bit alpha, which selects one of two global 8- bit alpha values which provide 129 layers of transpar- ency. with overlay enabled, the output byte data rate is limited to 45% of the sdra m clock, or up to an 81- mhz evo clock, whichever is smaller. ? optional horizontal 2x upscaling of the video image for display. the overlay is always in display format. ? in data-streaming mode, the evo acts as a high bandwidth continuous-output data channel. the byte data rate is limited to an 81-mhz evo clock. ? in message-passing mode, the evo can send mes- sages from 1 word (4 bytes) up to 16 mb. the byte data rate is limited to an 81-mhz evo clock. ? for diagnostic purposes, evo output data can be internally looped back to the vi port. this is con- trolled by the vi diagmode bit. 7.4.2 summary of operation the evo normally supplies continuous video data to its outputs. the evo is progr ammed and started by the pnx1300 dspcpu. the evo issues an interrupt to the dspcpu at the end of each tr ansmitted field, and/or at a programmable vertical position in the field. the dspcpu updates the evo video image data pointers with pointers to the next field during the vertical blanking interval so as to maintain continuous video output. during video output, the evo supplies embedded ccir 656 sav (start ac- tive video) and eav (end active video) sync codes and optionally supplies horizontal and frame sync signals. the evo can either supply pixel clock and horizontal and frame timing signals or it can lock to external timing sig- nals such as those suppli ed by a philips saa7125 denc digital encoder or similar sync source. 7.5 interface table 7-1 lists the interface pins of the evo unit. figure 7-1 , figure 7-2 , and figure 7-3 illustrate typical connections for commonly-used external devices that in- terface to the evo. the most common way to generate analog video is shown in figure 7-1 . in this setup, an saa7125 digital encoder (denc) can be programmed to derive sync ei- ther from the vo_data stre am eav/sav codes, or from its rcv1/2 pins. figure 7-2 illustrates how a byte-p arallel ecl-level stan- dard ccir 656 interface can be created. in certain pro- fessional applications, serial d1 video is also used. in that case, the evo can be connected to a gennum gs9022 digital video serializer or similar part (not shown). figure 7-3 shows the evo unit of one pnx1300 con- nected to the vi unit of a second pnx1300. 1. refer to ccir recommendation 656: interfaces for dig- ital component video signal s in 525 line and 625 line television systems. recommendation 656 is included in the philips desktop video data handbook. pnx1300 vo_data[7:0] (hs) vo_io1 (fs) vo_io2 vo_clk saa7125 mp[7:0] rcv1 rcv2 llc figure 7-1. evo connected to a digital video encod- er (denc). pnx1300 vo_data[7:0] vo_clk 8 1 16 2 ttl to ecl ccir 656 subminiature ?d? connector data a,b[7:0] clock a,b figure 7-2. evo connected to a ccir 656 video- output connector.
philips semiconductors enhanced video out preliminary specification 7-3 7.6 block diagram figure 7-4 shows a block diagram of the evo unit. it con- sists of a clock generator, a video frame timing generator and an image or data generator. the image generator produces either a ccir 656 digital video data stream with optional yuv overlay or a continuous-data or mes- sage-data stream. it also pe rforms optional format con- version and optional 2:1 horizontal scaling. the frame timing generator provides programmable im- age timing including horizontal and vertical blanking, sav and eav code insertion, overlay start and end tim- ing, and horizontal and frame timing pulses. it also sup- plies data-valid timing signals in data-streaming mode and start-of-message and en d-of-message timing sig- nals in message-passing mode. the sync timing pulses can be generated by the frame timing unit, or the frame timing unit can be driven by externally-supplied sync tim- ing pulses, when vo_ctl. sync_master = 0 and evo_ctl. genlock = 1. the video clock generator produces a programmable video clock. the video cloc k generator can supply the video clock for the frame timing generator and external devices, or it can be driven by an external clock signal. 7.7 clock system positive edges of vo_clk drive all evo output events. a block diagram of the evo clock system is shown in figure 7-5 . the evo clock is either supplied externally or internally generated by the evo, as controlled by the vo_ctl. clkout bit. when clkout = 0, the evo clock is supplied by an external source through the vo_clk pin as an input. this is the default mode, en- tered at hardware reset. when clkout = 1, an internal clock generator supplies the evo clock and drives the vo_clk pin as an output. the internal clock generator system is a square wave di- rect digital synthesizer (dds) which can be pro- grammed to emit frequencies from 1 hz to 50 mhz. the output of the dds is sent to a phase-locked loop filter (pll) which removes clock jitter from the dds output table 7-1. evo unit interface pins signal name typ e description vo_data[7:0 ] out ccir 656-style yuv 4:2:2 digital out- put data, or gener al-purpose high speed data output channel. output changes on positive edge of vo_clk. vo_io1 i/o-5 horizontal sync (hs) output or start message (stmsg) output. see figure 7-18 . vo_io2 i/o-5 frame sync (fs) input, fs output or endmsg output. ? if set as fs input, it can be set to respond to positive or negative edge transitions. ? if the evo operates in genlock mode and the selected transition occurs, the evo sends two fields of video data. ? in message-passing mode, this pin acts as the endmsg output. see figure 7-18 . vo_clk i/o-5 the evo unit emits vo_data on a positive edge of vo_clk. vo_clk can be configured as an input (the hardware reset defaul t) or output. ? if configured as an input, vo_clk is received from external display-clock master circuitry. ? if configured as output, the pnx1300 emits a low-jitter clock frequency programmable between approx. 4 and 81 mhz. pnx1300 a vo_data[7:0] (stmsg) vo_io1 (endmsg) vo_io2 vo_clk pnx1300 b vi_data[7:0] vi_data[8] vi_data[9] vi_clk vi_dvalid logic ?1? figure 7-3. evo unit connected to the vi unit of a second pnx1300. video frame timing generator video clock generator image generator overlay generator message/data generator vo_io1 (hs, start msg, or valid data pulse) vo_io2 (vs, end msg, or valid data level) vo_clk vo_data[0:7] sdram highway figure 7-4. evo unit block diagram. square-wave dds frequency pll filter vo_clk vo_clk internal (to frame timing gen.) clkout 9 cpu clock 0 31 figure 7-5. evo clock system.
pnx1300/01/02/11 data book philips semiconductors 7-4 preliminary specification signal. the pll can also be used to divide or double the dds frequency. the pll vco operates from 8-mhz to 90 mhz. the pll is enabled and programmed as de- scribed in section 7.19 . dds clock rate is set by the vo_clock. frequency field according to the equation shown in figure 7-6 . the vo_clk frequency can be a divider or multiplier of f dds , as determined by the pll subsystem settings. low-jitter clock mode is au tomatically entered whenever frequency[31] = 1. if frequency[31] = 0, the dds operates at 1/3 the rate (f or compatibility with tm-1000 code), and frequency must be set as shown in figure 7-7 . the dds synthesizer maximum jitter can be computed as follows: example of jitter values can be found in table 7-2 . 7.8 image timing the evo emits a serial byte-data stream used by ccir 656 devices to generate a displayed image. figure 7-9 shows an ntsc-compatible, 525-line inter- laced image. the field and line numbers are shown for reference. interlaced images are generated by the display hardware by controlling the vertical re trace timing. for reference, figure 7-8 shows a timing diagram of ntsc-compatible interlaced frame timing illustra ting the analog vertical re- trace signal. the vertical retrace signal for the second field begins in the middle of the horizontal line that ends the first field. this causes t he first line of the second field to begin halfway across the display screen and the lines of the second field to be scanned between the lines of the first field, resulting in an interlaced display. the analog timing required to generate the interlaced signal is supplied by the display device. the ccir 656 digital video signals generated by the evo use frame synchronization timing and do not generate any vertical retrace timing. 7.8.1 ccir 656 pixel timing the evo generates pixels according to ccir 656 timing in yuv 4:2:2 co-sited format and outputs these pixels as shown in figure 7-10 . pixels are generated in groups of two, with four bytes per two pixels. each pair of pixels has two luminance bytes (y0, y1) and one pair of chromi- nance bytes (u0, v0) arrang ed in the sequence shown. the chrominance samples u0 and v0 are sampled spa- tially co-sited with luminan ce sample y0. for pal or ntsc video, pixels are generated at a nominal rate of 13. 5 mpix/sec. (27 mb/sec.). pixels are clocked out on the positive edge of vo_clk. 7.8.2 ccir 656 line timing the ccir 656 line timing is shown in figure 7-11 . each line begins with an eav code, a blanking interval and an sav code, followed by the line of active video. the eav code indicates end of active video for the previous line, and the sav code indicates star t of active video for the current line. table 7-2. jitter values for common dspcpu mhz f dspcpu (mhz) jitter (nsec) f dspcpu (mhz) jitter (nsec) 143 0.777 180 0.617 166 0.669 200 0.555 figure 7-6. dds low-jitter oscillator frequency. frequency 2 31 f dds 2 32 ? 9 f dspcpu ? ---------------------------- - + = figure 7-7. dds slow speed oscillator frequency frequency f dds 2 32 ? 3 f dspcpu ? ---------------------------- - = jitter 1 9 f dspcpu ? ---------------------------- - = 1 19 20 262 263 282 525 1 one frame one line field 2 field 1 blanking blanking active video active video 1/2 line interlace offset vertical sync video lines figure 7-8. interlaced timing?ntsc analog sync. signals.
philips semiconductors enhanced video out preliminary specification 7-5 7.8.3 sav and eav codes the end active vi deo (eav) and start active video (sav) codes are issued at the start of each video line. eav and sav codes have a fixed format: a 3-byte pre- amble of 0xff, 0x00, 0x00 followed by the sav or eav code byte. the eav and sav code byte format is shown in figure 7-12 for reference. the eav and sav codes define the start and end of th e horizontal blanking inter- val, and they also indicate the current field number and the vertical blanking interval. line 20 line 21 line 282 line 283 line 262 line 263 line 524 line 525 field 1 field 2 scan direction displayed image figure 7-9. interlaced display: 525-line, 60-hz image. u0 y0 v0 y1 u2 y2 v2 y3 u4 byte 0 line scan @ 27 mhz = 13. 5 mpix/sec. vo_data[0:7] vo_clk y4 figure 7-10. ccir 656 pixel timing. es s ee blanking active video blanking active video line i line i+1 sav, eav codes yuv 4:2:2 pixels figure 7-11. ccir 656 line timing. figure 7-12. format of sav and eav timing codes. preamble 11111111 00000000 00000000 1fvh pppp timing reference code protection bits (error correction) h = 0 for sav h = 1 for eav v = 1 during field blanking v = 0 elsewhere f = 0 during field 1 f = 1 during field 2
pnx1300/01/02/11 data book philips semiconductors 7-6 preliminary specification the sav and eav codes have a 4-bit protection field to ensure valid codes. the evo generates thes e protection bits as part of the sav and eav codes as defined by ccir 656. there are 8 pos sible valid sav and eav codes shown with their co rrect protection bits in table 7-3 . the evo generates sav and eav sync codes and inserts them into the video out data stream ac- cording to the ccir 656 specification under all condi- tions, whether it is generating or receiving horizontal and frame timing information. 7.8.4 video clipping sav and eav codes are identified by a 3-byte preamble of 0xff, 0x00 and 0x00. this combination must be avoided in the video data ou tput by the evo to prevent accidental generation of an invalid sync code. the evo provides programmable maximum and minimum value clipping on the video data to prevent this possibility. if clipping is enabled, the evo automatically clips the re- sulting image data as described in section 7.15.3 . 7.8.5 ccir 656 frame timing the interlaced frame timing defined by ccir 656 is shown in table 7-4 . lines are numbered from 1 to 525 for 525-line, 60-hz systems and from 1 to 625 for 625- line, 50-hz systems. the field and vertical blanking col- umns indicate whether the fiel d and vertical blanking bits, respectively, are set in th e sav and eav codes for the indicated lines. the 525 and 625 formats have similar timing but differ in their line numbering. 7.9 enhanced video out timing generation the evo generates timing fo r frames, active video areas within frames, images within the active video area, and overlays within the image area. the relationship between these four is shown in figure 7-13 . the frame includes the timing for both interlaced fields. progressive scan, or non-interlaced video, is accomplished by setting the tim- ing parameters such that two identical successive fields are generated. 7.9.1 active video area shown in figure 7-13 , the active video area begins after the horizontal and vertical blanking intervals and repre- sents the pixels visible on the screen. the image area is the actual displayed image wi thin the active video area. it can be slightly smaller than the active video area to avoid edge effects at the top, bottom and sides of the im- age. the overlay area is within the image area. the evo uses counters to generate and control image timing. the frame line counter and frame pixel counter control the overall timing for the frame and de- fine the total number of pixels per line, lines per frame, and interlace timing, includi ng horizontal and vertical blanking intervals. note that the frame line c ounter has a starting value of one, not zero, and it counts from 1 to 525 or 625, consis- tent with ccir 656 line numbering. the image line counter and image pixel counter define the visible im- age within the field. the geometry of the active video area is defined by the contents of several mmio registers shown in figure 7-29 . the vo_frame. field_2_start field defines the start line of field 2. field 2 is active when the field line counter contents equal or exceed this value. the active video area is defined by the f1_video_line and f2_video_line fields of the vo_field register for each field of the frame, and by the video_pixel_start field of the vo_line register for each line of the frame. the active video area begins when the contents of the fr ame line counter and frame pixel counter equals or exceeds these values. table 7-3. sav and eav codes code binary value field vertical blanking sav 1000 0000 1 eav 1001 1101 1 sav 1010 1011 1 x eav 1011 0110 1 x sav 1100 0111 2 eav 1101 1010 2 sav 1110 1100 2 x eav 1111 0001 2 x table 7-4. ccir 656 frame timing line number f bit v bit comments 525/60 625/50 1?3 624?625 1 1 vertical blanking for field 1, sav/eav code still indicates field 2 4?19 1?22 0 1 vertical blanking for field 1, change sav/eav code to field 1 20?263 23?310 0 0 active video, field 1 264?265 311?312 0 1 vertical blanking for field 2, sav/eav code still indicates field 1 266?282 313?335 1 1 vertical blanking for field 2, change sav/eav code to field 2 283?525 336?623 1 0 active video, field 2
philips semiconductors enhanced video out preliminary specification 7-7 7.9.2 sav and eav overlap period the ccir 656-compliant 525/60 and 625/50 timing specifications define an ov erlap period where the field number in the sav and eav co des from field 1 persists into the vertical blanking interval for field 2, and the codes for field 2 persist into the vertical blanking interval for field 1. the f1_olap and f2_olap fields of the vo_field register define these overlap intervals. f1_olap and f2_olap are small two?s complement values in the range -8... +7. a positive value indicates that the overlap extends into the current field, while a negative value indicates that it extends backward into the previous field. see figure 7-31 for the effect of negative and positive values. during the overlap interval, t he vertical blanking for the next field has begun; however, the field number flag in the sav and eav codes still shows the field number for the previous field. the field number is updated to the cor- rect field value at the end of the overlap interval. f1_olap defines the overlap from field 1 to field 2. this overlap occurs during the beginning of vertical blanking for field 2. the sav and eav codes continue to show field 1 during this overlap interval, and they change to field 2 at the end of the interval. f2_olap defines the overlap from field 2 to field 1. this overlap occurs during the beginning of vertical blanking for field 1. the sav and eav codes continue to show field 2 during this overlap interval, and they change to field 1 at the end of the interval. 7.9.3 control of frame and image counters the frame and image counters have different start and stop points. the frame counters begin in the vertical blanking interval of the first field and the horizontal blank- ing interval of the first line. they stop counting when they reach the height and width values of the frame. when the evo generates frame timing, the frame counters are re- set to their start values when they reach their stop val- ues. when the evo receives frame timing signals, the frame counters continue counting until reset by the exter- nal signals. the image area is defined by vo_ythr register fields image_voff and image_ho ff. these values are added to the f1_video_line or f2_video_line and video_pixel_start values to define the starting line and pixel, respectively, of the image area. the image area is active when the contents of the frame line counter and frame pixel counter equal or exceed these values. the image line counter and image pixel counter start counting at the first active pixel in the image area and the first active line in the image area, respectively. the im- age counters start at zero and stop counting when they reach their image height and width values. the image counters are reset by frame counter values indicating the start of the image pixel in a line and the start of the image line in a field. the image counters define the active image area of the frame, the area of interest for image processing. this al- lows the overlay start address to be defined relative to the active image area, for example. when the evo is not sending out active pixels from the image area, it sends out blanking codes. the blanking codes are 0x80, 0x10, 0x80, and 0x10 for each 2-pixel group in yuv 4:2:2 im- age data format, as defined by ccir 656 and shown in figure 7-10 . 7.9.4 horizontal and frame timing signals the evo can supply horizontal and frame timing signals or receive a frame timing signal from an external source. when vo_ctl. sync_master = 1, the evo gener- ates horizontal and frame timing for the external video device. when sync_master = 0, the evo operates in genlock mode and an external device, such as a denc, must provide frame sync. this section describes evo operation when it is sync master. see section 7.10 for a description of genlock mode. if sync_master = 1, the vo_io1 signal generates a horizontal timing signal, and the vo_io2 signal gener- ates a frame timing signal. when evo_enable = 1 and field_sync = 1, the vo_io2 signal indicates the field number (low = field 1, high = f ield 2), according to the sav/eav field indication (b it[6]) as shown in figure 7-14 . the vo_io2 signal toggles just before the first byte of the preamble that protects the eav code and after the sav code. non-interlaced output can be simulated by pro- gramming the evo to generate fields equivalent to the desired frames. in this case, vo_io2 indicates odd or even frames. overlay image area, field 1 vertical blanking, field 1 horizontal blanking overlay image area, field 2 vertical blanking, field 2 horizontal blanking image v offset image v offset image h offset image h offset image width image height frame active video area active video area start pixel start line figure 7-13. active video area and image area in re- lation to vertical and horizontal blanking intervals.
pnx1300/01/02/11 data book philips semiconductors 7-8 preliminary specification the horizontal timing signal vo_io1, shown in figure 7-15 , corresponds to the horizontal-blanking in- terval. it is active low from the eav code at the start of the line to the sav code at the start of active video for the line. 7.10 genlock mode in genlock mode, the evo is not synchronization master but receives frame timing signals on vo_io2. the evo operates in genlock mode when sync_master = 0, evo_ctl. evo_enable = 1 and evo_ctl. gen- lock = 1. the active edge can be programmed using the vo_ctl. vo_io2_pos bit. the initial transition of the frame tim- ing signal on vo_io2 causes the frame line counter to be set to the value in vo_frame. frame_preset. after reaching frame_length, the frame line counter starts counting again from 1. evo_slvdly. slave_dly is typically used to com- pensate for any delay in the frame timing source or inter- nal pipeline synchronization anywhere in a line. internal- ly, the active edge of vo_i o2 is delayed by slave_dly vo_clk clock cycles. typically, it will allow frame_ preset to be loaded at the beginning of a new line. with correct values of slave_dly and frame_preset loaded, the pnx1300 can generate frames totally synchronized with the active edge of vo_io2. all the internal mmio registers (except of course vo_ctl) should be programmed with the same values as for sync_ master mode. see figure 7-16 . in genlock mode, the evo is free-running according to the values programmed in its internal registers before the initial vo_io2 active edge. just after receiving the active edge that will synchronize the evo, output values may be erroneous for several vo_clk cycles, but it is guar- anteed that the next frame will be correct. after the first synchronizing edge, if the next one hap- pens according to the values programmed in the evo mmio registers, no change will appear in the output tim- ing of the evo. if the active edge of vo_io2 does not match the programmed value, a new synchronization phase is performed. typically, this is programm ed as follows: slave_dly is loaded with the number of clock cycles for one video line minus the number of delay cycles used by the evo to synchronize itself. frame_preset is programmed with the value 2. with this programming, the active edge of vo_io2 will happen just bef ore the first byte (pream- ble) of the first line. the first active edge of vo_i o2 is delayed internally by slave_dly vo_clk cycles so th at it appears internally just before the start of the second line minus the internal evo pipeline delay. after this internal pipeline delay, the line counter is loaded by fr ame_preset, (?2?), and the evo starts sending data for line 2. for the next frame, if the internal evo programming matches the vo _io2 timing, the evo will appear to start 4 19 20 265 266 283 1 4 one frame one line field 2 field 1 blanking blanking active video active video vertical sync video lines ntsc pal 263 264 282 525 3 blanking blanking 23 310 311 312 313 335 336 623 624 625 1 22 1 vo_io2 figure 7-14. evo vo_io2 ti ming in field_sync mode. image line: image width blanking image width, pixels field width, pixels sav eav vo_io1 image data eav blanking figure 7-15. evo vo_io1 ti ming in field_sync mode.
philips semiconductors enhanced video out preliminary specification 7-9 the first byte of the first lin e just after the vo_io2 active signal. 7.11 data transfer timing in data-streaming and message-passing modes, the evo supplies a stream of 8-bit data. no data selection or data interpretation is done, and data is transferred at the rate of one byte per vo_clk. data is clocked out on the positive edge of vo_clk. when data-streaming mode is enabled and evo_enable = 1 and sync_streaming = 1, the vo_io2 signal indicates a data-valid condition. this sig- nal is asserted when the evo starts outputting valid data (that is, data-streaming mode is enabled and video out is running), and is de-asserted when data-streaming mode is disabled. as shown in figure 7-17 , the data-valid sig- nal on vo_io2 is asserted just before the first valid byte is present on vo_data[7:0], and is de-asserted just af- ter the last valid byte was sen t, or if an hbe error is sig- naled. all transitions of vo_io2 occur on the rising edge of vo_clk. the vo_io1 signal generates a pulse one vo_clk cycle before the first valid data is sent. the transitions of vo_io1 occur on the rising edge of vo_clk and last for one vo_clk cycle. in message-passing mode, the evo issues signals on vo_io1 and vo_io2 to indicate the start and end of messages. when message passing is started by setting vo_ctl. vo_enable, the evo send s a start condition on vo_io1. when the evo has transferred the contents of the buffer, it sends an end condition on vo_io2, sets bfr1_empty, and interrupts the dspcpu. the evo stops, and no further operation takes place until the dspcpu sets vo_enable ag ain to start another mes- sage, or until the dscpu in itiates other evo operation. the timing for these signals is shown in figure 7-18 . 7.12 image data memory formats 7.12.1 video image formats the evo accepts memory-resident video image data in three formats: yuv 4:2:2 co-sited, yuv 4:2:2 inter- spersed, and yuv 4:2:0. these formats are shown in figure 7-19 through figure 7-21 . eav image data eav line 525/625 one frame vo_io2 delay slave_dly in vo_clk cycles line 1 line 2 line frame_preset line 525/625 line 1 eav line counter loaded by frame_preset figure 7-16. genlock mode. vo_data[7:0] vo_io2 vo_io1 vo_clk xx xx d0 d1 d2 d3 d4 d5 dk xx xx data_valid figure 7-17. data-streaming valid data signals. vo_data[7:0] vo_io1 vo_io2 vo_clk xx d0 d1 d2 d3 d4 d5 d6 d7 xx xx start of message end of message figure 7-18. message-passin g start and end signals.
pnx1300/01/02/11 data book philips semiconductors 7-10 preliminary specification 7.12.2 planar storage of video image data in memory video image data is stored in memory with one table for each of the y, u and v components. this is called planar format. this is shown in figure 7-22 for yuv 4:2:2 image data. the evo merges bytes from each of the three ta- bles to generate the ccir 6 56-compatible output data. the u and v tables have the same number of lines but half the number of pixels per line as the y table. the transfer is the same for yu v 4:2:0 format except the u and v tables will be 1/4 the size of the y table. the u and v tables have the half the number of lines and half the number of pixels per line as the y table. 7.12.3 graphics overlay image format graphics overlay image data is stored in a pixel-packed format in sdram. graphics images are stored in yuv 4:2:2+alpha format. figure 7-23 shows this format. the yuv overlay area is always within the image output res- olution. the evo does not upscale the graphics overlay image. if the evo is upscaling the video image by 2 , the graphics overlay must be provided in upscaled format. pixel data is a 16-bit data and follows endian-ness con- ventions based on 16-bit data. refer to appendix c, ?en- dian-ness? for details. 7.13 video image conversion algorithms the memory video image data formats are converted to the output yuv 4:2:2 co-sited format and optionally up- scaled 2 horizontally. the conversion algorithms are detailed below. chrominance (u,v) samples luminance samples figure 7-19. yuv 4:2:2 co-sited format. chrominance (u,v) samples luminance samples figure 7-20. yuv 4:2:2 interspersed format. chrominance (u,v) samples luminance samples figure 7-21. yuv 4:2:0 format.
philips semiconductors enhanced video out preliminary specification 7-11 7.13.1 yuv 4:2:2 interspersed to yuv 4:2:2 co-sited conversion the evo accepts data from sdram in either yuv 4:2:2 co-sited, yuv 4:2:2 interspersed, or yuv 4:2:0 inter- spersed formats. if the input data is in yuv 4:2:2 or yuv 4:2:0 interspersed format, in terspersed-to-co-sited con- version is performed to generate co-sited output. the evo uses a 4-tap, (?1, 5, 13, ?1)/16 filter to perform this conversion on the u and v chroma data. figure 7-24 shows an example of interspersed to co-sited conversion. 7.13.2 yuv 4:2:0 to yuv 4:2:2 co-sited conversion yuv 4:2:0 to yuv 4:2:2 conver sion is a variation of yuv 4:2:2 interspersed-to-co-s ited conversion. the yuv 4:2:0 format has the u and v pixels positioned between lines as well as between pixe ls within each line. it also has half the number of u and v pixels compared to yuv 4:2:2 formats. the evo conver ts yuv4:2:0 to yuv 4:2:2 co-sited by using the u and v chrominance pixel values for both surrounding lines and converting the resulting u and v pixels from interspersed to co-sited format. this is shown in figure 7-25 . for true vertical re-sampling of u and v, the pnx1300 icp unit can be invoked on u and v to convert from yuv 4:2:0 to yuv 4:2:2 interspersed. 7.13.3 yuv-2x upscaling in the yuv-2 modes, the evo performs 2 horizontal upscaling of the yuv data from sdram. no vertical up- scaling is performed. the width of the result image (image_width) should be an even number. upscaling is performed by 4-tap filterin g. for all 3 memory formats, y luminance data is upscal ed using a (?3,19,19,?3)/32 filter to generate the missing output pixels. output pixels at the same location as the input pixels use the corre- sponding input pixel values, as shown in figure 7-26 . the u and v chrominance values are generated in the same way as the y luminance signal for 2 upscaling, as- suming that both the input and output use yuv 4:2:2 co- sited chrominance coding. the u and v output pixels at the same location as the u and v input pixels use the cor- responding input pixel values. the u and v output pixels between the u and v input pi xels are generated using the (?3,19,19,?3)/32 filter, as shown in figure 7-26 . if the input chroma is interspersed, a (?1,13,5,?1)/16 fil- ter is used to generate the u and v output pixels that are displaced by half a y pixel from the u and v input pixels, and a (?1,5,13,?1)/16 filter is used to generate the addi- tional upscaled u and v output pixels that are displaced by 1. 5 pixels from the u and v input pixels. this is shown in figure 7-27 . 7.13.4 pixel mirroring for four-tap filters the evo uses a 4-tap filter for upscaling and for convert- ing from interspersed to co-sited format. one extra pixel is needed at the beginning and two at the end of each line processed by this filter. these pixels are supplied width pixels height lines pix0 pix1 pix2 pix w?1 ? ? ? y_base_adr width/2 pixels height lines pix0 pix2 ? ? ? u_base_adr (repeated for v_base_addr, v_offset) y_offset u_offset figure 7-22. image storage in planar memory format for yuv 4:2:2. figure 7-23. yuv 4:2:2+alpha overlay format. overlay_width pixels overlay_height lines pix0 pix1 pix2 pix w?1 ? ? ? ol_base_adr ol_offset y0 u0 y1 v0 yuv 4:2:2+ chrominance (u,v) samples luminance samples input pixels: yuv output pixels: yu?v? co-sited chrominance output: u?,v? = (?1,5,13,?1)/16 u,v figure 7-24. yuv interspersed to co-sited conversion.
pnx1300/01/02/11 data book philips semiconductors 7-12 preliminary specification automatically by mirroring the first and last pixels of each line. for example: ? output pixel 1 uses input pixel 1 to generate its value. (same locati on, no filtering). ? output pixel 2 uses pixels 1,1, 2 and 3 to generate its value. ? output pixel 3 uses pixel 2 to generate its value. ? output pixel 4 pixel uses pixels 1, 2, 3 and 4, etc. chrominance (u,v) samples luminance samples input pixels: yuv 4:2:0 output pixels: yu?v? 4:2:2 co-sited chrominance output: u?,v? = (?1,5,13,?1)/16 u,v y0,0; u0,0; v0,0 y0,0 u0,0; v0,0 y0 y1 y2 y3 u0, v0 u2, v2 y0, u0, v0 y1, u0, v0 y2, u2, v2 y3, u2, v2 figure 7-25. yuv 4:2:0 to yuv 4:2:2 co-sited conversion. chrominance (u,v) samples luminance samples input pixels: yuv output pixels: y?u?v? output location same as input pixel: y?u?v? = yuv upscaled luminance output between input pixels: y? = (-3,19,19,-3)/32 y upscaled chrominance output between input pixels: u?,v? = (-3,19,19,-3)/32 u,v figure 7-26. 2x upscaling of y pixels. chrominance (u,v) samples luminance samples input pixels: yuv output pixels: y?u?v? co-sited chrominance output u?,v? = (?1,13,5,?1)/16 u,v co-sited chrominance output u?,v? = (?1,5,13,?1)/16 u,v upscaled luminance output same as input pixel: y? = y upscaled luminance output between input pixels: y? = (-3,19,19,-3)/32 y figure 7-27. 2x upscaling of u and v with interspersed to co-sited conversion.
philips semiconductors enhanced video out preliminary specification 7-13 ?... ? output pixel 2n?2 uses pixels n?2, n?1, n, and n?1 to generate its value. ? output pixel 2n?1 uses pixel n to generate its value. ? output pixel 2n uses pixels n?1, n, n, and n?1 to generate its value. figure 7-28 shows an example of six pixels upscaled to 12 pixels. 7.14 evo operating modes evo operating modes belong to two groups as follows: ? video-refresh modes ? data-transfer modes data-transfer modes are further broken down into data- streaming mode and message-passing mode. the operating mode is set by the vo_ctl. mode field and the vo_ctl. ol_en (overlay enable) control bit. the vo_ctl. mode field determines video-refresh, message-passing or data-streaming mode. it further de- fines the video image format and whether or not 2 hori- zontal upscaling takes place. the ol_en bit determines whether a video-refresh mode has a graphics overlay present. the modes are shown in table 7-5 . 7.15 video processing if enabled, the pnx1300 implements functions for chro- ma keying, alpha blending and programmable clipping, as described in this section. 7.15.1 alpha blending if enabled by setting evo_enable = 1 and full_blending = 1, the evo provides full 129-layer alpha blending of a background video image with a fore- ground graphics overlay image. if either bit is 0, the evo implements the cruder 25% step alpha blending resolu- tion of the tm-1000. alpha bl ending can operate in con- junction with chroma keying, as described in section 7.15.2 . alpha blending combines a graphics overlay image with the video image according to an alpha value provided with each overlay pixel. the graphics overlay is taken from a pixel-packed yuv 4:2:2+ data structure in mem- ory. in the yuv 4:2:2+ format, each pixel has a single -bit supplied as the lsb of the u and v pixels. the u byte lsb corresponds to the alpha for pixel y0, the v byte lsb for pixel y1, respectively. when the -bit is ?0?, the alpha_zero register supplies the actual 8-bit value. when the -bit is ?1?, the alpha_one register supplies the 8-bit value. in the yuv 4:2:2 format, only one set of u and v values is supplied for the two y pixels, y0 and y1. in this case, the alpha bit in u0 determines the alpha value for u, y0 and v. the alpha blend bit in v0 only sets the alpha value for y1 and does not affect the u or v values. the evo uses the 8-bit content of the selected alpha blending register (alpha_zero or alpha_one) to determine the amount by which the overlay plane is merged with the image plane as follows. the least-signif- icant 7 bits of the selected blending register encode 128 table 7-5. evo operating modes mode function explanation video-refresh modes 0 yuv 4:2:2c-1 yuv 4:2:2 co-sited, no scaling 1 yuv 4:2:2i-1 yuv 4:2:2 interspersed, no scaling 2 yuv 4:2:0-1 yuv 4:2:0, no scaling 3 reserved 4 yuv 4:2:2c-2 yuv 4:2:2 co-sited, horizontal 2 upscaling 5 yuv 4:2:2i-2 yuv 4:2:2 interspersed, horizontal 2 upscaling 6 yuv 4:2:0-2 yuv 4:2:0, horizontal 2 upscaling 7 reserved data-transfer modes 8 data streaming continuous transmiss ion of raw 8-bit data with valid data pulse and level timing signals 1 input pixels: y output pixels: y? 23456 1357911 24681012 y?=y1 y?=y2 y?=y3 y?=y4 y?=y5 2n?1: y?=y6 y?=f(y1,y1,y2,y3) y?=f(y1,y2,y3,y4) y?=f(y2,y3,y4,y5) y?=f(y3,y4,y5,y6) y?=f(y4,y5,y6,y6) 2n: y?=f(y5,y6,y6,y5) figure 7-28. mirroring pixels in 2x upscaling. 9 message passing transmission of raw 8-bit data with stmsg and endmsg timing sig- nals 0xa ? 0xf reserved table 7-5. evo operating modes mode function explanation
pnx1300/01/02/11 data book philips semiconductors 7-14 preliminary specification blending levels from 0 to 0x7f. the msb is used to turn on blending (msb = ?0?) or to select the overlay plane as the only output (msb = ?1?), so all values between 0x80 and 0xff select 100% overlay. therefore, the total num- ber of blending levels is 129: 128 variable blending val- ues from 0 to 0x7f plus one ?blending? value from 0x80 to 0xff for 100% overlay. an alpha value of 0 selects 100% image plane and 0% overlay. similarly, a value of 0x40 selects 50% image and 50% overlay blending. the equations for the blend ing are illustrated below. 7.15.2 chroma keying if the evo_enable and key_enable bits are set to ?1? in evo_ctl the pnx1300 activates chroma keying. the graphics overlay is taken from a pixel-packed yuv 4:2:2+ data structure in memory. the evo_key regis- ter provides the value which signifies full transparency for the overlay. the overlay values (y, u and v) are com- pared to the values stored in bit-fields of the evo_key register. evo_key has three 8-bit fields: key_y, key_u and key_v, which stor e the values to be com- pared to the y, u, and v components, respectively, of the overlay for chroma keying. bits that correspond to bits set in mask_y and mask_uv are ignored for the com- parison. when there is an ex act match between the pixel value and the value in ev o_key (disregarding any bits masked by mask_y and m ask_uv), then the overlay value is not present in the output stream, resulting in full transparency. the mask bits in evo_mask provide for varying de- grees of precision in the chroma-key matching process. the evo_mask. mask_y field can mask from 0 to 4 lsbs of the overlay y component during the chroma key process. for example, se tting mask_y = 1 eliminates the influence of the lsb of key_y in the keying process. this can be used to widen the range of key matching to account for irregularities in the chroma-key video signal. likewise, evo_mask. mask_uv is used to mask from zero to four lsbs of the overlay u and v components during the chroma key process. for example, setting mask_uv = 1 eliminates the influence of the lsb of key_u and key_v in the keying process. 7.15.3 programmable clipping if evo_ctl. cli pping_enable = 1 the evo performs fully-compliant programmable clipping. clipping is per- formed as the last step of the video pipeline, after chroma keying and alpha blending. it is applied only on the image areas (field 1 and field 2) defined by image_width, image_height, image_vo ff and image_hoff in- side the active video area. blanking values are not clipped. the evo_clip mmio register stores four 8-bit fields used to clip output components. the y output compo- nent is clipped between the values stored in lower_clipy and higher_c lipy. a value less than or equal to lower_clipy is forced to lower_clipy and a value greater than or equal to higher_clipy is forced to higher_clipy. the same behavior is implemented for u and v with the values stored in the lower_clipuv and higher_clipuv fields. this mode allows fully-com pliant 16 to 235 y clipping and 16 to 240 cb and cr clipping to be programmed. these are the default values of the evo_clip register after reset. if clipping_enabl e = 0, the evo clips y, u and v be- tween the default values 16 and 240, as it is implemented in the tm-1000. when lower_clip{y,uv} registers are set to ?0? and higher_clip{y,uv} registers are set to ?255?, no clipping is performed. 7.16 mmio registers the mmio registers are in two groups: ? vo registers ? control bas ic vo functions (those shared with the tm-1000 vo unit) ? evo registers ? control new evo unit functions (those new in tm-1100/tm-1300/pnx1300) vo mmio registers are shown in figure 7-29 . vo mmio register names are prefixed with ?vo_?. generally, their functionality is unchanged ex cept where noted in the text (see for instance, section 7.16.1 ). the register fields are described in table 7-6 , table 7-7 and table 7-8 . they are discussed in sections 7.16.1 through 7.18.1 . evo mmio registers are shown in figure 7-30 . evo mmio register names are prefixed with ?evo_?. the evo_ctl register selectively enables new tm- 1100/tm-1300/pnx1300 functi ons. the register fields are described in table 7-9 and table 7-10 . they are dis- cussed in sections 7.16.4 and 7.16.5 . to ensure compatib ility with future devices, any unde- fined mmio bits should be ignored when read, and writ- ten as ?0?s. if alpha[7] = 1 then output[7:0] = overlay[7:0] else output[7:0] = (alpha[6:0] overlay[7:0] + (alpha[6:0] + 1) image[7:0]) >> 7 (or) output[7:0] = (alpha[6:0] (overlay[7:0] ? image[7: 0]) >> 7) + image[7:0]
philips semiconductors enhanced video out preliminary specification 7-15 vo_status (r) 0x10 1800 mmio_base offset: vo_clock (r/w) 0x10 1808 vo_frame (r/w) 0x10 180c vo_field (r/w) 0x10 1810 frequency frame_preset f2_olap vo_ctl (r/w) 0x10 1804 mode field_2_start f2_video_line vo_line (r/w) 0x10 1814 video_pixel_start vo_image (r/w) 0x10 1818 image_height vo_ythr (r/w) 0x10 181c y_threshold vo_olstart (r/w) 0x10 1820 ol_start_line vo_olhw (r/w) 0x10 1824 ol_start_pixel reset sleepless clkout sync_master vo_io1_pos vo_io2_pos ol_en bfr1_ack bfr2_ack hbe_ack urun_inten ytr_inten urun_ack ytr_ack ltl_end vo_enable 31 0 3 7 11 15 19 23 27 vo_yadd (r/w) 0x10 1828 y_base_adr or bfr1base_adr vo_uadd (r/w) 0x10 182c u_base_adr or bfr2base_adr vo_vadd (r/w) 0x10 1830 v_base_adr or size1 vo_oladd (r/w) 0x10 1834 ol_base_adr or size2 vo_vuf (r/w) 0x10 1838 u_offset(16) vo_yolf (r/w) 0x10 183c y_offset(16) v_offset(16) 31 0 3 7 11 15 19 23 27 frame_length f1_video_line f1_olap frame_width image_width image_voff image_hoff global alpha 1 overlay_height overlay_width ol_offset(16) global alpha 0 bfr2_inten hbe_inten bfr1_inten clock_select pll_s pll_t reserved 31 0 3 7 11 15 19 23 27 31 0 cur_y(12) 3 7 11 15 19 23 27 cur_x(12) bfr1_empty bfr2_empty hbe urun ytr field2 vblank 1 indicates evo functionality figure 7-29. evo mmio registers.
pnx1300/01/02/11 data book philips semiconductors 7-16 preliminary specification 7.16.1 vo status register (vo_status) the vo_status register is a read-only register that shows the current status of the evo. its fields are shown in figure 7-29 and table 7-6 . vo_status[4] is now hard-wired to ?1?. this allows soft- ware to determine if the unit is an evo unit (containing extra mmio registers) or a tm-1000 vo unit, as follows. in the tm-1000, this bit is a copy of the hbe flag (vo_status[5]). in the evo un it, it is hard-wired to ?1?. software can use this bit to determine the type of (e)vo unit by clearing the hbe bit then reading vo_status[4]. if the bit remains ?1?, the unit is an evo. table 7-6. vo_status ? status register fields field description cur_y current y. image line index of the current line in th e current field being output by the evo. cur_y reflects the current state of the image line counter. cur_x and cur_y form a single 24-bi t output data byte counter (cur_x is the counter lsbs) when the evo is in data-streaming or message-passing mode. this counter reflects the status of the size counter for the currently active buffer. the two lsbs of this counter are not valid fo r reading during transfers; only the upper 22 bits (the word count) are valid. cur_x current x. image pixel index of the most -recently-output pixel. cur_x reflects th e current state of the image pixel counter. bfr1_empty bfr2_empty buffers 1 and 2 empty. these bits are valid in video-refresh , data-streaming and message-passing modes. ? in video-refresh modes, only buffer 1 is used. bfr1_empty indicates that the last byte of a field has been transferred. it is actually raised at the completion of t he transmission of the ov erlap area of the field, as shown in figure 7-31 . at this point, software should assign a new field of imagery to {y,u,v}_base_adr and perform a bfr1_ack. if bfr1_empty is not cleared by bfr1_ack befor e the active video area of the next field starts to be emitted, the evo sets the urun bit. ? in data-streaming mode, bfr1_empty and bfr2_empty i ndicate that the last byte in their corresponding buffer has been transferred. when bfr1_empty or bfr 2_empty is set, transfer stops from the correspond- ing buffer. ? in message passing mode, bfr1_empty si gnals completion of message transmission. these bits cause an interrupt if their interrupt-enabl e bits are set. one interrupt per buffer is signaled. hbe highway bandwidth error. hbe is set when the highway fails to respond in time to a highway read request and data was not ready in time to be set on evo data lines. hbe can be set in both image- and data-transfer modes. hbe i ndicates insufficient band- width was requested from the highway arbiter. 1 evo unit indicator. this bit allows software to determine if the unit is an evo (containing ex tra mmio registers) or a tm-1000 vo unit. in the tm-1000, this bit is a copy of t he hbe flag. in the evo unit, it is hard-wired to ?1?. software can easily deter- mine the type of video output unit by cl earing the hbe bit then reading this bit. ytr y threshold. in video-refresh modes, ytr indicates that the image line counter value is equal to the y_threshold value in vo_ythr. the y_threshold value can be set to provide an interrupt on any line in the valid image area. urun underrun. in video-refresh and data-streaming mode, this bit indicate s that the cpu did not perform an acknowledge to indi- cate updated address pointers for the nex t field or buffer in time for co ntinuous image or data transfer. urun causes an interrupt if the correspondi ng interrupt-enable condition is set. ? in video-refresh modes, urun indica tes that the sav code marking begi nning of active video has been gener- ated without bfr1_ack being set by t he cpu. (setting bfr1_ack to ?1? clears bfr1_empty). in this case, video refresh continues with previous address pointers. ? in data-streaming mode, urun indicates the last byte in the active buffer was transferred, and no bfr1_ack or bfr2_ack occurred to enable the next buffer. in this ca se, transfer continues with previous address pointers. field2 field 2 or buffer 2 active. ? in data-streaming mode, field2 = 0 when buffer 1 is active; field2 = 1 when buffer 2 is active. ? in video-refresh modes, field2 indica tes that the evo is actively sendi ng out a video image for field 2, as defined by figure 7-31 . vblank vertical blanking. indicates that the evo is in a vertical-blanking inte rval. vblank is asserted only in video-refresh modes.
philips semiconductors enhanced video out preliminary specification 7-17 7.16.2 vo control register (vo_ctl) the vo_ctl register sets the operating mode, enables interrupts, clears interrupt fl ags, and initiates evo oper- ations. its fields are unchanged from the tm-1000, as shown in figure 7-29 and table 7-7 , however the pre- cise functionality implemented by a field may be changed if pnx1300 functionality is ena bled by software. its hard- ware reset value is 0x32400000 which sets clock_select = 3, pll_s = 1 and pll_t = 1, and all other bits to ?0?. to en sure compatibility with future de- vices, any undefined mmio bits should be ignored when read, and written as ?0?s. table 7-7. vo_ctl register fields field description reset software reset of the evo. the recommended software reset procedure is as follows. ? write the desired vo_ctl state with the reset bit set to ?1?. ? write the desired vo_ctl state word, this time with the reset bit cleared to ?0?. both writes should have vo_enable set to 0. ? finally, enable the newly selected mode by setting vo _enable. this step should be done last, as a separate transaction. after a software reset, 5 vo_clk cl ock cycles are required to stabilize the internal circ uitry (before enabling evo). note: a hardware reset clears the clkout and sync_master bits and puts vo_clk, vo_io1, and vo_io2 in the input state. this results in a vo_ctl value of 0x32400000. in contrast, a software reset does not change device registers. so a software reset results in a state as specified by the vo_ctl word value written during the above-described procedure. sleepless disable power management. if sleepless = 1, power-down of the evo is prevented during global pnx1300 power-down. clock_select clock select. 00 ? select pll vco output as the vo_clk source. 01 ? select pll feedback loop divi der output as vo_clk source. 10 ? select pll input divider output as vo_clk source. 11 ? select dds output directly as vo_clk source, bypassing the pll altogether. (hardware reset default.) pll_s pll input divider division ratio. a value of k selects division by k +1. the hardware reset default = 1, causing division by 2. pll_t pll feedback loop divider division ratio. a value of k selects division by k +1. the hardware reset default = 1, causing division by 2. clkout clock output. ? when clkout = 1, the evo clock generator is enabled, and vo_clk is an output. ? when clkout = 0, vo_clk is an input, and evo clock is provided by the external device. (hardware reset default.) sync_master sync master. ? when set, vo_io1 and vo_io2 are outputs. in video- refresh modes, the evo generates horizontal and frame timing signals on vo_io1 and vo_io2 respectively. in message-passing mode and data-streaming mode, this bit should always be set so that vo_io1 and vo_io2 generate start and end message signals respectively. ? when zero, vo_io2 is an input. (hardware reset default. ) in video-refresh modes, vo_io2 serves as the frame time reference. the active edge is selected by vo_io2_pos. vo_io1_pos vo_io2_pos polarity of vo_iox_pos. vo_io1_pos currently has no function. vo_io2_pos determines the input polarity of vo_io2. ? when ?0?, the corresponding input triggers on the negativ e (high-to-low) transiti on of the input signal. ? when ?1?, the input triggers on the positive (low-to-high) transition. ol_en overlay enable. enables the yuv overlay function in video-refresh modes. mode major operating mode. defines the video output major operating mode, as listed in table 7-5 on page 7-13 . bfr1_ack bfr2_ack buffer 1 and buffer 2 acknowledge. when active in data-transfer modes, writing a ?1? to bfr1_ack clears bfr1_empty and enables buffer 1 for transfer until bfr1_empty is set. writing a ?0? to bf r1_ack has no effect. brf2_ack operates similarly for buffer 2. writing a ?1? to vo_enable in data-streaming mode is the same as writing a ?1? to both bfr1_ack and bfr2_ack, and enables both buffers 1 and 2 for transfer. writing a ?1? to vo_enable in message-passing mode is the same as writing a ?1? to bfr1_ack, and enables buffer 1 for transfer. bfr2_ack is not used in message- passing mode, since only buffer 1 is used. hbe_ack urun_ack acknowledge hbe or urun. writing a ?1? to these bits clears the hbe or urun flags and resets their corresponding interrupt conditions.
pnx1300/01/02/11 data book philips semiconductors 7-18 preliminary specification 7.16.3 vo-related registers the vo-related registers and their fields are shown in table 7-8 . their fields are unchanged from the tm-1000, however their function may vary depending upon the pnx1300 features that are selectively enabled by evo_ctl (see section 7.16.4 ). ytr_ack acknowledge y threshold. writing a ?1? to this bit clears the ytr flag and resets its interrupt condition. ytr signals the cpu to set new point- ers for the next field. if ytr_ack is not received by the time the active image area fo r the next field starts, the urun flag is set. data transfer cont inues with the old pointer values. bfr1_inten bfr2_inten hbe_inten urun_inten ytr_inten enable interrupt conditions. enable corresponding interrupts to be generated when the bfr1_empty, bfr2_empty, hbe, urun (under- run/end of transfer), and ytr (end of field/ buffer) flags are set, respectively. note: bfr2_inten, urun_inten, ytr_inten must be 0 in message passing mode. ltl_end little-endian. specifies that data in sdram is stor ed in little-endian format. this only affects the overlay packed-image format interpretation in video-refresh modes. refer to appendix c, ?endian-ness,? for details on byte ordering. vo_enable enable the evo to send image data or message data to its output. note: this bit should not be simultaneously asserted wi th the reset bit. the correct sequence to reset and enable the evo is as follows. ? set all vo_ctl control fields as desired , writing vo_ctl with reset = 1, vo_enable = 0. ? retain all desired values of c ontrol fields, but rewrite vo_ctl with reset = 0, vo_enable = 0. ? finally, still retaining all des ired control fields, re write vo_ctl with reset = 0, vo_enable = 1. setting vo_enable in video-refresh modes starts the evo sending image data beginning with the first pixel in the image. setting vo_enable in data-streaming and message-passing modes starts the evo sending data beginning with the first byte in buffer 1. in video-re fresh and data-streaming modes, vo_enable remains set until cleared by the cpu. in message-passing mode, vo_enable is cleared when bfr1_empty is set, indicating the end of message transfer. note: de-asserting vo_enable in video-refresh modes causes sdram reads to stop, but sync framing and bfr1_empty generation and interrupts remain fully oper ational. the transmitted ac tive image data is undefined in this case. to fully halt video output, a software reset is required. table 7-7. vo_ctl register fields field description table 7-8. vo register fields register field description vo_clock frequency vo_clk frequency. see dds equation in figure 7-6 , and pll description in section 7.19 . vo_frame frame_length total number of lines per frame; t he ending value of the frame line counter; typically 525 or 625. note: the frame line counter counts from 1 to 525 or 625, consistent with ccir 656 line numbering. field_2_start start line number in the frame line c ounter; where the second field of the frame begins. if non-interlaced pictures are desired, then th e same value is programmed for field 1 and field 2. field 1 becomes frame 1 and field 2 becomes frame 2. frame_preset value loaded into the frame line counter when frame timing edge is received on vo_io2. vo_field f1_video_line line number in the frame line counter of the first active video li ne of field 1 of the frame. f2_video_line line number in the frame line counter of the first active vi deo line of field 2 of the frame. if non-interlaced pictures are desired, this is programmed to the same value as f1_video_line f1_olap overlap of the sav and eav codes from fiel d 1 to field 2. overlap is defined as the delay in lines from start of blanki ng for field 2 until sav and eav codes for field 2 are emitted. typical values are +2 for 525/60 and +2 for 625/50. f2_olap overlap in lines of the sav and eav code fr om field 2 to field 1. overlap is defined as the delay in lines from start of blanking for field 1 until t he sav and eav codes for field 1 are emitted. typical values are +3 for 525/60 and ?2 for 625/50. the negative value means field 1 blanking actually starts two li nes before end of field 2 of previous frame. this overlap is described in table 7-4 on page 7-6 , and illustrated in figure 7-31 .
philips semiconductors enhanced video out preliminary specification 7-19 vo_line frame_width total line length in pixels including blanking. also the ending value for the frame pixel counter. lines always begin with a horizontal blanking interval, and the image starts after the blanking interval and runs to the end of the line. video _ pixel _ star t pixel number in frame pixel count er of starting pixel of acti ve video area within the line. note: must be even. vo_image image_height video image height in lines. image_width video image line (scaled) output widt h in pixels. must be even for upscaling by 2 . vo_ythr y_threshold threshold image line number in the image line counter fo r the ytr interrupt. can be reprogrammed on a frame-by-frame basis. image_voff image vertical offset in lines from the top of the active video window. image_hoff image horizontal offset in pixels from the start of the active video window. vo_olstart ol_start_line starting image li ne of yuv overlay within the image. zero indicates that the overlay star ts at the same line as the image. ol_start_pixel starting image pixel of the yuv overlay within the image. ?0? indicates that the overlay starts at same pixel as the image. note: must be even. alpha_one alpha blend value used for yuv 4:2:2+ alpha format overlays when the alpha bit = 1. vo_olhw overlay_height height of the yuv overlay image in lines. note: the height of the overlay should be cho- sen such that it does not extend beyond the image area. overlay_width width of the yuv overlay image in pixels. note: must be even. alpha_zero alpha blend value used for yuv 4:2:2+ alpha format overlays when the alpha bit = 0. vo_yadd y_base_adr bfr1base_adr y-component buffer address or buffer 1 address. ? in video-refresh modes: y-component starting byte address. ? in data-streaming and message-passing modes: buffer 1 starting byte address. note: must be 64-byte aligned in data-streami ng mode and 4-byte aligned in message pass- ing mode. vo_uadd u_base_adr bfr2base_adr u-component buffer address or buffer 2 address. ? in video-refresh modes: u-co mponent starting byte address ? in data-streaming mode: buffer 2 starting byte address; must be 64-byte aligned ? not used in message-passing mode vo_vadd v_base_adr size1 v-component buffer address or buffer 1 length. ? in video-refresh modes: v-component starting byte address ? in data-streaming and message-passing modes: buffer 1 length in bytes. note: must be a multiple of 64 in data-streaming mode. size1 is limited to 24 bits. vo_oladd ol_base_addr size2 overlay-image buffer address or buffer 2 length. ? in video-refresh modes: overlay-image starting byte address. ol_base can be repro- grammed on a frame-by-frame basis. ? in data-streaming mode: buffer 2 length in by tes. note: must be multiple of 64 in data- streaming mode; not used in message-passing mode. vo_vuf u_offset offset in bytes from start of one line to start of next line (16-bits unsigned). v_offset offset in bytes from start of one line to start of next line (16-bits unsigned). vo_yolf y_offset offset in bytes from start of one line to start of next line (16-bits unsigned). ol_offset offset in bytes from start of one line to start of next line (16-bits unsigned). table 7-8. vo register fields register field description
pnx1300/01/02/11 data book philips semiconductors 7-20 preliminary specification 7.16.4 evo control register (evo_ctl) pnx1300 evo features are enabled by setting the ap- propriate fields of the evo_ctl register shown in figure 7-30 . the register fields are described in table 7-9 . if features are enabled, new pnx1300 the functionality replaces tm-1000 functions. the hardware reset value of evo_ctl register is 0x10000000, which means that evo functions are dis- abled on reset and must be enabled by software. the ms four bits indicate the evo revision number. to ensure compatib ility with future devices, any unde- fined mmio bits should be ignored when read, and writ- ten as ?0?s. mmio_base offset: evo_mask (r/w) 0x10 1844 evo_clip (r/w) 0x10 1848 evo_key (r/w) 0x10 184c evo_ctl (r/w) 0x10 1840 clipping_enable sync_streaming field_sync key_enable evo_enable 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 full_blending 1 0 0 0 reserved reserved key_y key_v key_u higher_clipuv lower_clipuv higher_clipy lower_clipy mask_y mask_uv genlock reserved evo_slvdly (r/w) 0x10 1850 reserved slave_dly figure 7-30. evo mmio registers. table 7-9. evo_ctl register fields register field description evo_ctl evo_enable when set to 1, evo features are enabled . when set to 0 (the hardware reset value), the evo behaves exactly like a tm- 1000 vo unit. default: 0. full_blending activates full 8-bit alpha blending when set to 1. when set to 0, only the original five tm-1000 blending levels are implemented (0%, 25%, 50%, 75%, 100%). default: 0. clipping_enable when set to 1, the values stored in evo_c lip are used for the clipping of output data. otherwise, tm-1000 default values (240 and 16 for y, u and v) are used. default: 0. sync_streaming when set to 1 in data-streaming mode, vo_io2 generates a data_valid signal. see section 7.18.2, ?data-transfer modes? . default: 0. field_sync when set, vo_io2 will generate frame synchr onization signal that follow s the field number in sav/eav codes (field1 gives a low vo_io2 , field2 gives a high vo_io2). default: 0. genlock activates genlock mode when set to 1 and vo_ctl. sync_master = 0. default: 0. key_enable when set, this bit activates chroma key. the overlay values (y, u and v) are compared to the val- ues stored in the evo_key register. bits that correspond to bits set in mask_y and mask_uv are ignored for the comparison. when there is an exact match between the pixel value and the value in evo_key register (le ss the bits selected by mask_y and mask_uv), then the overlay value is not present in the output st ream, resulting in full transparency. the key is 24 bits (y, u and v are 8 bits each) . default: 0.
philips semiconductors enhanced video out preliminary specification 7-21 7.16.5 evo-related registers as shown in figure 7-30 , four additional registers are in- troduced in the pnx1300, as follows. ? evo_mask and evo_key ? used in chroma key (see section 7.15.2 ). ? evo_clip ? provides programmable clipping (see section 7.15.3 ). ? evo_slvdly ? used in genlock mode (see section 7.10 ). these registers are shown in figure 7-30 , and their reg- ister fields are shown in table 7-10 . to ensure compat ibility with future devices, any unde- fined mmio bits should be ignored when read, and writ- ten as ?0?s. 7.17 enhanced video out operation as described in section 7.14 , the evo operates in either video-refresh or data-transfer modes. the dspcpu starts the evo by setting the appropriate vo mmio reg- isters and the appropriate evo mmio registers. vo_ctl. mode must be set to the appropriate transfer mode, appropriate addresses, address offsets, and im- age timing registers and the associated control bits in the control register must be set. lastly, software sets vo_ctl. vo_enable to beg in evo operation. the evo transfers the image, data, or message as com- manded. in video-refresh and data-streaming modes, the evo runs continuously. in message-passing mode, the evo runs only until the message has been trans- ferred. the evo unit is reset by a pnx1300 hardware reset, or by a software reset, as described in table 7-7 for the re- set bit. the vo_clk signal is normally set as an output to drive the data transfer for all modes at a programmable rate. the vo_clk signal can be an input or output, as con- trolled by the vo_ctl. clkout bit. when clkout = 1, vo_clk is an output, and its frequency is set by the vo_clock register value. when clkout = 0, vo_clk is an input and the evo gener- ates data at the cloc k rate of the sender. in video-refresh modes, the evo receives or generates horizontal and frame synchronization signals on the vo_io1 and vo_io2 lines, as described in section 7.9.4 . 7.17.1 video refresh modes in video-refresh mode, the evo transfers an image from sdram to the evo port. the vo_ctl. mode field de- fines the video image memory data format and deter- mines whether the evo is to perform horizontal upscal- ing (see table 7-5 ). the evo accepts memory image data in yuv 4:2:2 co-sited, yuv 4:2:2 interspersed and yuv 4:2:0 formats, and gener ates a ccir 656-compati- ble, yuv 4:2:2 co-sited image output stream. scaling is identified by the yuv-1 and yuv-2 modes. in yuv-1 modes, luminance and chrominance pass unmodified. in yuv-2 modes, luminance and chrominance are hori- zontally upscaled by a factor of two. during video refresh, the vo_status. ytr bit is set when the image line counter reaches the y_threshold value. when an image field has been transferred, the vo_status. bfr1_empty bit is set. the dspcpu is interrupted when either the ytr or bfr1_empty flag is set and its corresponding interrupt is enabled. to maintain continuous transfer of image fields, the dspcpu supplies new pointers for the next field following each bfr1_empty interrupt. if the dspcpu does not supply new pointers before the next field, the urun bit is set, and the evo uses the same pointer values until they are updated. table 7-10. evo-related mmio registers fields register field description evo_mask mask_y this 4-bit value is used to mask the four lower bits of the overlay y component during the chroma key process. exam ple: setting mask_y to ?1? will eliminate the influence of the lsb of key_y in the keying process. mask_uv this 4-bit value is used to mask the f our lower bits of the overlay u and v components during the chroma key process. example: setting mask_uv to ?1? will eliminate the influence of the lsb of key_u a nd key_v in the keying process. evo_clip lower_clipy a y value lower or equal to low er_clipy is forced to lower_clipy. default: 16. higher_clipy a y value higher or equal to higher_ clipy is forced to higher_clipy. default: 235. lower_clipuv an u or y value less than or equal to lower_clipuv is forced to lower_clipuv. default: 16. higher_clipuv an u or and an v value higher th an or equal to higher_clipuv is forced to higher_clipuv. default: 240. evo_key key_y value compared to the y component of the overlay for chroma keying. key_u value compared to the u component of the overlay for chroma keying. key_v value compared to the v component of the overlay for chroma keying. evo_slvdly number of vo_clk cycles of inte rnal delay for vo_io2 in genlock mode.
pnx1300/01/02/11 data book philips semiconductors 7-22 preliminary specification graphics overlay the graphics overlay is enabled by the vo_ctl. ol_en bit. the graphics overlay is ty pically a software-generat- ed graphic overlaid onto the output video image stream. the graphics overlay is eit her generated in yuv by the dspcpu or converted by the dspcpu from an rgb to a yuv overlay image. bec ause rgb-to-yuv conversion can potentially lose informat ion, this conversion is done by the dspcpu, because it has the most information about how best to perform this conversion in the most ef- fective manner. the overlay height should be chosen such that the over- lay does not vertically ex tend beyond the image area. a height greater than this causes undefined results and may result in vertic al overlay wraparound. note: the emitted byte data rate is limited to 45% of the sdram clock when overlays are enabled. the yuv overlay logic assembles the u0, y0, v0, y1 bytes for a pair of yuv 4:2:2 pixels for both the main im- age and the overlay image. the alpha bit for pixel 0 (the lsb of the u0 byte of the overlay image) selects alpha_zero or alpha_on e as the alpha source, and the alpha blend logic combines u0, y0, and v0 from the main and overlay images to generate the u0, y0 and v0 output values. the alpha bit for pixel 1 (the lsb of the v0 byte of the overlay image) selects alpha_zero or alpha_one as the alpha source for blending the y1 pixels to generate the y1 output value. the alpha blend- ed u0, y0, v0 and y1 bytes are sent to the evo output port in the yuv 422 sequence. the overlay u and v val- ues used assume an lsb of zero. video image addressing the output image is read from sdram at a location de- fined by y_base_adr, y_offset, u_base_adr, u_offset, v_base_adr, and v_offset. the de- fault memory packing is big-endian although little-endian packing is also supported by setting the vo_ctl. ltl_end bit. horizontally-adjacent sample s are stored at successive byte addresses, resulting in a packed form (four 8-bit samples are packed into one 32-bit word). upon horizon- tal retrace, the starting byte address for the next line is computed by adding the corr esponding offset value to the previous line?s starting byte address. note that {ol,y,u,v}_offset values ar e 16-bit unsigned quanti- ties. this process continues until the total image?height in lines and width in pixels per line?has been read from memory for luminance (y). for chrominance, the same number of lines are read, but half the number of pixels per line are read in yuv 4:2:2 and yuv 4:2:0 formats 1 . the yuv 4:2:0 format has half the number of u and v lines in memory that the yuv 4:2:2 formats have, but each line of u and v data is read and used twice. see figure 7-19 through figure 7-22 . blanking: field 2 overlap blanking: field 1 video image: field 1 blanking: field 1 overlap blanking: field 2 video image: field 2 525 line / 60 hz 4 20 264 266 283 525 blanking: field 1 video image: field 1 blanking: field 1 overlap blanking: field 2 video image: field 2 625 line / 50 hz 1 23 311 313 336 623 blanking: field 2 overlap 624 625 1 figure 7-31. evo frame timing. 1. note that consecutive pixel components of each line are stored in consecutive memory addresses but con- secutive lines need not be in consecutive memory ad- dresses
philips semiconductors enhanced video out preliminary specification 7-23 7.18 frame and field timing control the frame timing for 525/60 and 625/50 timing cases is shown pictorially in figure 7-31 . ccir 656 line defini- tions are used. 7.18.1 recommended values fo r timing registers the recommended values for the various fields of the timing registers are shown in table 7-11 for 525/60 and 625/50 timing cases. the frequency field value shown is for 27 mhz assuming a dspcpu clock of 143 mhz. 7.18.2 data-transfer modes in data-streaming and message-passing modes, the evo supplies a stream of 8-bit data to the vo_data[7:0] lines at rates up to 81 mhz. note: in the pnx1300, the data-rate is limited to an 81- mhz evo clock. data is read from sdram in packed form (four 8-bit bytes per 32-bit word). no data selection or data interpre- tation is done, and data is transferred at one byte per vo_clk from successive byte addresses. note : unused bits of the evo mmio registers must be set to 0 when operating in data transfer modes. data-streaming mode. in data-streaming mode, data is stored in sdram in two buffers. when the evo has transferred out the contents of one buffer, it interrupts the d spcpu and begins transferring out the contents of the second buffer. the dspcpu sup- plies pointers to both buffers. the evo can provide a continuous stream of data to the evo output if the dspcpu updates the pointer to the next buffer before the evo starts transferring data from the next table. note: in this mode, sync_m aster must be set to en- sure correct operation of vo_io1 and vo_io2 as out- puts. when each buffer has been transferred, the correspond- ing buffer-empty bit is set in the status register, and the dspcpu is interrupted if the buffer-empty interrupt is en- abled. to maintain continuous transfer of data, the dspcpu supplies new pointers for the next data buffer following each buffer-empty interrupt. if the dspcpu does not supply new pointers before the next field, the urun bit is set, and the evo uses the same pointer val- ues until they are updated. when data-streaming mode is enabled and evo_enable = 1 and sync_streaming = 1, the vo_io2 signal indicates a data-valid condition. this sig- nal is asserted when the evo starts outputting valid data (that is, data-streaming mode is enabled and video out- put is running) and is de-asserted when data-streaming mode is disabled. the vo_io1 signal generates a pulse one vo_clk cycle before the first valid data is sent. see section 7.11 for timing signal details. message-passing mode . in message-passing mode data is stored in sdram in one buffer. note: in this mode, sync_m aster must be set to en- sure correct operation of vo_io1 and vo_io2 as out- puts. when message passing is started by setting vo_ctl. vo_enable, the evo send s a start condition on vo_io1. when the evo has transferred the contents of the buffer, it sends an end condition on vo_io2 as shown in figure 7-18 , sets bfr1_empty, and inter- rupts the dspcpu. the evo stops, and no further oper- ation takes place until the dspcpu sets vo_enable again to start another mess age, or until the dscpu ini- tiates other evo operation. see section 7.11 for timing signal details. 7.18.3 interrupts and error conditions the evo has five interrupt conditions defined by bits in the vo_status register: bfr1_empty, bfr2_empty, hbe, urun, and ytr. each of these conditions has a corresponding interrupt enable flag and interrupt acknowledge bit in the vo_ctl register. the evo asserts a source 10 interrupt request to the pnx1300 vectored interrupt controller as long as one or more enabled events is asserted. note: the interrupt controller should always be pro- grammed such that the evo interrupt operates in level- triggered mode. this ensures that no evo events can be lost to the interrupt handler. refer to section 3.5.3, ?int and nmi (maskable and non-maskable interrupts),? for a description of setting level-triggered mode, as well as for recommendations on writing interrupt handlers. the bfr1_empty, bfr2_empty and ytr status flags indicate to the dspcpu that a buffer has been emptied or that the y threshold has been reached. the buffer-underrun (urun) status flag indicates that the dspcpu did not ackn owledge a bfr1_empty or table 7-11. timing register recommended values register field 525/60 value 625/50 value vo_clock frequency 0x855e, e191 0x855e, e191 vo_frame frame_length 525 625 field_2_start 264 311 frame_preset 1 1 vo_field f1_video_line 20 23 f2_video_line 283 336 f1_olap 2 2 f2_olap 3 ?2 (0xe) vo_line frame_width 858 864 video_pixel_star t 138 144 vo_image image_height 240 288 image_width 720 720 (704 visible)
pnx1300/01/02/11 data book philips semiconductors 7-24 preliminary specification bfr2_empty interrupt before the evo required the next buffer. in this case, the evo uses the old address pointer value and continues image or data transfer. when the dspcpu updates the pointer, the new pointer value will be used at the start of the next frame or buffer transfer. therefore, the urun flag can be interpreted as indicating to the dspcpu that the evo is using its old pointer values because it did not receive the new ones in time. note: the actual buffer pointer write operation to the mmio registers is not seen by the hardware?only writ- ing a ?1? to the appropriate bfr1_ack or bfr2_ack bits signals buffer availability. the hardware bandwidth error (hbe) flag indicates that the evo did not get data from sdram via the pnx1300?s internal data highway in time to continue data transfer or video refres h. data or video refresh will continue using whatever data is in the evo internal data buffers. the address counte r for the failing buffer(s) will continue to count, and the evo will continue to request data from the sdram over the highway. the evo is a read-only device, transferring data from sdram to the evo output port. unlike video in, the evo does not modify sdram da ta. urun and hbe are the only evo error conditions that can arise. in the case of urun or hbe, a scrambled image may be temporarily displayed or incorrect data may be temporarily sent. the evo can cause no other system hardware error condi- tions. even changing operating mo des can not cause system hardware error conditions to arise. for example, chang- ing the mode bits, the ol_en and format bits, or the ltl_end bit while the evo is running may cause wrong data to be displayed or transferred. however, the evo does not detect this or stop for it. in normal operation, the user should not change the mode or transfer-control bits while the evo is enabled. the evo should be disabled before changing bits such as the mode bits, the ol_en bit, or the ltl_end bit. however if these bits are changed while the evo is run- ning, they will take effect at th e beginning of the next field or buffer. 7.18.4 latency and bandwidth requirements in order to avoid hardware bandwidth error (hbe) con- ditions, the internal highway bus arbiter (see chapter 20, ?arbiter? ) must be programmed according to the latency requirements of the evo unit described in this section. in the following discussion, it is assumed that data for video lines (in y, u, v and overlay planar memory format) is stored in memory aligned on 64-byte boundaries. in oth- er words, it means that the {ol,y,u,v}_offset fields are multiples of 64 bytes. otherwise internal evo arbitra- tion for ol, y, u and v reques ts will be different than de- scribed here, and the following latencies would not be guaranteed. the evo uses internal 64-byte buffers. 1. latency requirements for the evo in image mode 4:2:2 or 4:2:0 co-sited or interspersed without upscal- ing and with overlay disabled is expressed as follows. during 128 evo clock cycles, the evo block must have 2 requests acknowledged, that is, ([2ys, 1u and 1v] / 2). for example, if the evo clock is 27 mhz, then the evo must get two requests (128 bytes) from sdram in 128 / 027 = 4740 ns. the byte bandwidth b 1x per video line within the ac- tive image for this case is: where ceil( x ) is a function returning the least integral value greater than or equal to x , and w is the image_width field value. 2. in the same modes but with overlay enabled, the la- tency is as follows: ? during the first 64 evo clock cycles at least one request must be acknowledged for the ol data. ? during 128 evo clock cycles, the evo unit must have 4 requests acknowledged ([4 ols, 2 ys, 1 v and 1 u] / 2). for example, if the evo clock runs at 54 mhz then the evo must get the first request from sdram in 64/. 054 = 1185 ns and must average a bandwidth la- tency of 4 requests in 128/.054 = 2370 ns. byte bandwidth b 1x,ol per video line within the active image is then as follows: 3. when the evo is set to image mode with 2 upscal- ing, the latency requirements are multiplied by a fac- tor of 2. for example, if 1 mode called for one re- quest per 64 evo clock cycles, the latency becomes one request per 128 evo clock cycles. bandwidth is roughly divided by 2: 4. latency for data-streaming mode or message-pass- ing mode is as follows: during 64 evo clock cycles, the evo unit must get one request from sdram. for example, if the evo clock runs at 38 mhz, then the latency is 64/.038 = 1684 ns and bandwidth is 38 mb/s. 7.18.5 power down and sleepless the evo block enters in power down state whenever pnx1300 is put in global power down mode, except if the sleepless bit in vo_ctl is se t. in the latter case, the block continues dma operat ion and will wake up the dspcpu whenever an interrupt is generated. b 1 x ceil w 64 ----- - () ceil w 128 -------- - ()24 + + ?? ?? 64 = b 1 xol b 1 x ceil w 32 ----- - ()4 + ?? ?? +64 = b 2 x ceil w 128 -------- - () ceil w 256 -------- - ()24 + + ?? ?? 64 = b 2 xol b 2 x ceil w 64 ----- - ()4 + ?? ?? +64 =
philips semiconductors enhanced video out preliminary specification 7-25 the evo block can be separately powered down by set- ting a bit in the block_power_down register. refer to chapter 21, ?power management.? it is recommended that evo be stopped (by negating vo_ctl. enable) before block level power down is started, or that sleepless mode is used when global power down is activated. 7.19 dds and pll filter details the pll filter reduces the p hase jitter of the dds synthe- sizer output. it can also be used to multiply the dds out- put frequency by 2 . the dds and pll filter together provide a high-quality, accurately-programmable output video clock. the pll filter block is shown in figure 7-32 . at hardware reset, the output multiplexer is set to 0x3, and the pll system is disabled. to start the pll system, the following steps must be performed: 1. assign a dds frequency. this starts the dds. allow for at least 31 dspcpu cycles for the dds frequency setting to take effect. 2. choose a value for pll_s and pll_t. for 8-40 mhz operation, a value of 1 (which selects division by 2) is recommended. 3. choose a value for clock_select. for 8-81 mhz operation, clock_select = 00 is recommended. 4. assign values to the vo_c tl register containing the above choices. the first assignment with clock_select not equal to 0x3 enables the pll system. allow for a maximum of 50 microseconds to achieve lock. once the pll is locked, small changes to the dds fre- quency are allowed, and the vo_clk output will smoothly track the frequency change. note: most consumer electronics equipment imposes very high precision requirements on the value of the col- or burst frequency. a video encoder will derive the color burst frequency from vo_clk. when changing the vo_clk frequency in software to phase-lock the evo to a master reference, special care is required to keep the color burst signal frequency within a tolerance of about 50 ppm. when using a philips denc (digital encoder), the color burst frequency is derived from the master denc frequency by a programmable synthesizer on the denc chip. in this case, vo_clk changes larger than 50 ppm are allowed by cha nging the denc synthesizer over its i 2 c interface to compensate for the vo_clk change. table 7-12 illustrates recommended settings. 00 01 10 11 square-wave dds frequency vco 8 ? 90 mhz vo_clk vo_clk internal (to frame timing gen.) clkout 9 cpu clock 0 31 loop filter phase detect pll_s div t+1 pll_t clock_select div s+1 figure 7-32. pll filter block diagram. table 7-12. dds and pll example settings desired frequency dds frequency pll_s pll_t clock_select usage 4 ? 10 mhz 8 ? 20 mhz 1 (divide by 2) 1 (divide by 2) 01 (t divider) custom low speed video 8 ? 45 mhz 8 ? 45 mhz 1 (divide by 2) 1 (divi de by 2) 00 (vco) standard or 16:9 digital video 40 ? 81 mhz 20 ? 40. 5 mhz 1 (divide by 2) 3 (divide by 4) 00 (vco) high pixel rate custom video
pnx1300/01/02/11 data book philips semiconductors 7-26 preliminary specification
preliminary specification 8-1 audio in chapter 8 by gert slavenburg 8.1 audio in overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 audio in (ai) unit connects to an off-chip stereo a/d converter subsystem through a flexible bit-se- rial connection. the ai unit provides all signals needed to interface to high quality, low cost oversampling a/d con- verters, including a generator for a precisely programma- ble oversampling a/d system clock. together, the ai unit and external a/d provide the following capabilities: ? one or two channels of audio input. ? 8- or 16-bit samples per channel. ? programmable sampling rate. ? internal or external sampling clock source. ? supports autonomous writes of sampled audio data to memory using double buffering (dma). ? supports 8-bit mono and stereo as well as 16-bit mono and stereo pc standard memory data formats. ? supports little- and big-endian memory formats. 8.2 external interface four pnx1300 pins are associated with the ai unit. the ai_osclk output is an ac curately programmable clock output intended to serve as the master system clock for the external a/d subsyste m. the other three pins (ai_sck, ai_ws and ai_sd) constitute a flexible serial input interface. using the ai unit?s mmio registers, these pins can be configured to operate in a variety of serial in- terface framing modes, including but not limited to: ? standard stereo i 2 s (msb first, 1-bit delay from ai_ws, left & right data in a frame). 1 ? lsb first with 1?16 bit data per channel. ? complex serial frames of up to 512 bits/frame, with ?valid sample? qualifier bit. the ai unit can be used with many serial a/d converter devices, including the philips saa7366 (stereo a/d), crystal semiconductor cs5331, cs5336 (stereo a/d?s), cs4218 (codec), analog devices ad1847 (codec). 1. a definition of the philips i 2 s serial interface protocol, among others, can be found in the philips ic01 da- tabook. table 8-1. ai unit external signals signal type description ai_osclk out over-sampling cl ock. this output can be programmed to emit any frequency up to 40-mhz with a sub hertz resolution. it is intended for use as the 256f s or 384f s over sampling clock by external a/d sub- system. ai_sck i/o-5 ? when the ai unit is programmed as serial-interface ti ming slave (power-up default), ai_sck is an input. ai_sck receives the serial bitclock from the external a/d subsystem. this clock is treated as fully asynchronous to pnx1300 main clock. ? when the ai unit is programmed as the serial-interface timing master, ai_sck is an output. ai_sck drives the serial clock for the external a/d subsystem. the frequency is a programmable inte- gral divide of the ai_osclk frequency. ai_sck is limited to 22 mhz. the sample rate of valid samples embedded within the serial stream is also limited by the bandwidth.latency available in the system ( section 8-10 ). ai_sd in-5 serial data from external a/d subsystem. data on this pin is sampled on positive or negative edges of ai_sck as determined by the clock_edge bit in the ai_serial register. ai_ws i/o-5 ? when the ai unit is programmed as the serial-interface timi ng slave (power-up default), ai_ws acts as an input. ai_ws is sampled on the same edge as selected for ai_sd. ? when the ai unit is programmed as the serial-interface timing master, ai_ws acts as an output. it is asserted on the opposite edge of the ai_sd sampling edge. ai_ws is the word-select or frame-syn- chronization signal from/to the external a/ d subsystem.
pnx1300/01/02/11 data book philips semiconductors 8-2 preliminary specification 8.3 clock system figure 8-1 illustrates the different clock capabilities of the ai unit. at the heart of the clock system is a square wave dds (direct digital synthesizer). the dds can be pro- grammed to emit frequencies from approx. 1 hz to 40 mhz with a resolution of better than 0.3 hz. the output of the dds is always sent on the ai_osclk output pin. this output is intended to be used as the 256f s or 384f s system clock source instead of a fixed fre- quency crystal for oversampling a/d converters, such as the philips saa7366t, or an alog devices ad1847. the pnx1300 ai dds frequency is set by writing to the frequency mmio register. the programmer can change the frequency setting dynamically, so as to adjust the input sampling rate to track an application de- pendent master reference. depending on bit 31 (msb), the dds runs in one of two modes: ? bit 31 = 1 (pnx1300 improved mode) ? bit 31 = 0 (tm-1000 compatibility mode) 8.3.1 pnx1300 improved mode in improved mode, a high quality, low-jitter ai_osclk is generated. the setting of the frequency register to accomplish a given ai_osclk frequency is given by: this mode, and the above formula, should be used for all new software development on pnx1300. it is not avail- able on tm-1000. in the improved mode the dds synthesizer maximum jit- ter can be computed as follows: example of jitter values can be found in table 8-2 . 8.3.2 tm-1000 compatibility mode tm-1000 compatibility mode is provided so that tm-1000 software runs without changes. it should not be used for new pnx1300 software development. tm-1000 mode is automatically en tered whenever frequen- cy[31] = 0. in tm-1000 mode, ai_osclk frequency is set as follows: 8.4 clock system operation ai_sck and ai_ws can be configured as input or out- put, as determined by the ser_master control field. as output, ai_sck is a divi der of the dds output fre- quency. whether input or output, the ai_sck pin signal is used as the bit clock for serial-parallel conversion. if set as output, ai_ws can similarly be programmed us- ing wsdiv to control the serial frame length from 1 to 512 bits. the preferred application of the clock system options is to use ai_osclk as a/d mast er clock, and let the a/d converter be timing master over the serial interface (ser_master=0). in case an external codec (e.g. the ad1847 or cs4218) is used for common audio i/o, it may not be possible to independently control the a/d and d/a system clocks. in that case it is recommended that the audio out (ao) unit frequency ai_osclk ai_sck ai_ws div n+1 sckdiv div n+1 square wave dds 9 dspcpuclk ai_sd ser_master serial to parallel converter 16 16 left[15:0] right[15:0] sample_clock (e.g. 64 f s ) wsdiv 31 0 70 0 8 (e.g. 256 f s ) figure 8-1. ai clock system and i/o interface. frequency 2 31 f osclk 2 32 ? 9 f dspcpu ? ----------------------------- - + = jitter 1 9 f dspcpu ? ---------------------------- - = table 8-2. jitter values for common dspcpu mhz f dspcpu (mhz) jitter (nsec) f dspcpu (mhz) jitter (nsec) 143 0.777 180 0.617 166 0.669 200 0.555 frequency f osclk 2 32 ? 3 f dspcpu ? ----------------------------- - = sckdiv 0 255 [, ] f aisck f ai osclk sckdiv 1 + ---------------------------------- =
philips semiconductors audio in preliminary specification 8-3 clock system dds is used to provide a single master a/ d and d/a clock. the ao unit, or the d/a converter, can be used as serial interface ti ming master, and the ai unit is set to be slave to the serial frame determined by ao (ai ser_master=0, ai_sck and ai_ws externally wired to the corresponding ao pins). in such systems, in- dependent software control over a/d and d/a sampling rate is not possible, but component count is minimized. 8.5 serial data framing the ai unit can accept data in a wide variety of serial data framing conventions. figure 8-2 illustrates the no- tion of a serial frame. if polarity=1 and clock_edge=0, a frame is defined with respect to the positive transition of the ai_ws signal, as observed by a positive clock transition on ai_sck. each data bit sam- pled on positive ai_sck transitions has a specific bit po- sition: the data bit sampled on the clock edge after the clock edge on which the ai_ws transition is seen has bit position 0. each subsequent clock edge defines a new bit position. as defined in table 8-5 , other combinations of polarity and clock_ed ge can be used to define a variety of serial frame bitposition definitions. the capturing of samples is governed by framemode. if framemode=00, every serial frame results in one sample from the serial-parallel converter. a sample is de- fined as a left/right pair in stereo modes or a single left channel value in mono modes. if framemode=1y, the serial frame data bit in bit position validpos is exam- ined. if it has value ?y?, a sample is taken from the data stream (the valid bit is allowed to precede or follow the left or right channel data provided it is in the same serial frame as the data). the left and right sample data can be in a lsb-first or msb-first form, at an arbitrary bit position, and with an ar- bitrary length. table 8-3. sample rate settings (f dspcpuclk =133 mhz, improved pnx1300 mode) f s osclk sck frequency sckdiv 44.1 khz 256f s 64f s 2187991971 3 48.0 khz 256f s 64f s 2191574340 3 44.1 khz 384f s 64f s 2208246133 5 48.0 khz 384f s 64f s 2213619686 5 table 8-4.ai mmio clock & interface control bits field name description ser_master 0 ? (reset default), the a/d converter is the timing master over the serial inter- face. ai_sck and ai_ws are set to be inputs. 1 ? pnx1300 is timing master over the ai serial interface. the ai_sck and ai_ws pins are set to be outputs. frequency sets the clock frequency emitted by the ai_osclk output. reset default 0. sckdiv sets the divider used to derive ai_sck from ai_osclk. set to 0..255, for divi- sion by 1..256. reset default 0. wsdiv sets the divider used to derive ai_ws from ai_sck. set to 0..511 for a serial frame length of 1..512. reset default 0. 7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 ai_sck ai_ws frame n 0 ai_sd frame n+1 figure 8-2. ai serial frame and bit positio n definition (polarity=1, clock_edge=0). table 8-5. ai mmio serial framing control fields field name description polarity 0 ? serial frame starts on ai_ws negedge (reset default) 1 ? serial frame starts on ai_ws posedge framemode 00 ? accept a sample every serial frame (reset default) 01 ? unused, reserved 10 ? accept sample if valid bit = 0 11 ? accept sample if valid bit = 1 validpos ? defines the bit posit ion within a serial frame where the valid bit is found. ? default 0. leftpos ? defines the bit position within a serial frame where the first data bit of the left channel is found. ? default 0. rightpos ? defines the bit posit ion within a serial frame where the first data bi t of the right channel is found. ? default 0. datamode 0 ? msb first (reset default) 1 ? lsb first sspos ? start/stop bit position. default 0. ? if datamode=msb first, sspos deter- mines the bit index (0..15) in the parallel word of the last data bit. bits 15 (msb) up to/including sspos are taken in order from the serial frame data. all other bits are set to ?0?. ? if datamode=lsb first, sspos deter- mines the bit index (0..15) in the parallel word of the first data bit. bits sspos up to/ including 15 are taken in order from the serial frame data. all ot her bits are set to ?0?.
pnx1300/01/02/11 data book philips semiconductors 8-4 preliminary specification in msb-first mode, the serial-to-parallel converter as- signs the value of the bit at leftpos to left[15]. sub- sequent bits are assigned, in order, to decreasing bit po- sitions in the left data word, up to and including left[sspos]. bits left[ sspos?1:0] are cleared. hence, in msb-first mode, an arbitrary number of bits are captured. they are left-adjusted in the 16-bit parallel out- put of the converter. in lsb-first mode, the serial to parallel converter assigns the value of the bit at leftpos to left[sspos]. sub- sequent bits are assigned, in order, to increasing bit po- sitions in the left data word, up to and including left[15]. bits left[sspos?1: 0] are cleared . hence, in lsb-first mode, an arbitrary number of bits are captured. they are returned left-adjusted in the 16-bit parallel out- put of the converter. refer to figure 8-3 and table 8-6 to see an example of how the ai unit mmio registers are set to collect 16-bit samples using the philips saa7366 i 2 s 18-bit a/d con- verter. this setup assumes the saa7366 acts as the se- rial master. for example, if it were desirable to use only the 12 msbs of the a/d converter in figure 8-3 , use the settings of table 8-6 with sspos set to ?4 ?. this results in left[15:4] being set with data bits 0..11, and left[3:0] being set to ?0?. right[15:4] is set with data bits 32..43 and right[3:0] is set to ?0?. 8.6 memory data formats the ai unit autonomously writes samples to memory in mono and stereo 8- and 16-bits per sample formats, as shown in figure 8-4 . successive samples are always stored at increasing memory address locations. the set- clock_edge ? if ?0?(reset default) the ai_sd and ai_ws pins are sampled on positive edges of the ai_sck pin. if ser_master =1, ai_ws is asserted on negative edges of ai_sck. ? if 1, ai_sd and ai_ws are sampled on neg- ative edges of ai_sck. as output, ai_ws is asserted on positive edges of ai_sck. table 8-5. ai mmio serial framing control fields field name description figure 8-3. serial frame of the saa7366 18 bit i 2 s a/d converter (format 2 sws). 1 63 62 52 51 50 34 33 32 31 19 18 ai_sck ai_ws ai_sd left n (18) 3 2 1 0 right n (18) 0 left n+1 (18) table 8-6. example setup for saa7366 field value explanation ser_master 0 saa7366 is serial master frequency 161628209 256f s 44.1 khz sckdiv 3 ai_sck set to ai_osclk/4 (not needed since ser_master=0) wsdiv 63 serial frame length of 64 bits (not needed since ser_master=0) polarity 0 frame starts with neg. ai_ws framemode 00 take a sample each ser. frame validpos n/a don?t care leftpos 0 bit position 0 is msb of left channel and will go to left[15] rightpos 32 bit position 32 is msb of right channel and will go to right[15] datamode 0 msb first sspos 0 stop with left/right[0] clock_edge 0 sample ws and sd on posi- tive sck edges for i 2 s figure 8-4. ai memory dma formats. adr left n adr+1 left n+1 adr+2 left n+2 adr+3 left n+3 adr+4 left n+4 adr+5 left n+5 adr+6 left n+6 adr+7 left n+7 8-bit mono adr left n adr+1 right n adr+2 left n+1 adr+3 right n+1 adr+4 left n+2 adr+5 right n+2 adr+6 left n+3 adr+7 right n+3 8-bit stereo 16-bit mono left n adr left n+1 adr+2 left n+2 adr+4 left n+3 adr+6 16-bit stereo left n adr right n adr+2 left n+1 adr+4 right n+1 adr+6
philips semiconductors audio in preliminary specification 8-5 ting of the little_endian bit in the ai_ctl register de- termines how increasing memory addresses map to byte positions within words. refer to appendix c, ?endian-ness,? for details on byte ordering conventions. the ai hardware implements a double buffering scheme to ensure that no samples are lost, even if the dspcpu is highly loaded and slow to respond to interrupts. the dspcpu software assigns buffers by writing a base ad- dress and size to the mmio control fields described in table 8-7 . refer to section 8.7 for details on hardware/ software synchronization. in 8-bit capture modes, the eight msbs of the serial par- allel converter output data are written to memory. in 16- bit capture modes, all bits of the parallel data are written to memory. if sign_convert is set to ?1?, the msb of the data is inverted, which is equivalent to translating from two?s complement to offset binary representation. this allows the use of an external two?s complement 16- bit a/d converter to generate 8-bit unsigned samples, which is often used in pc audio. note that the ai hardware does not generate a-law or - law 8-bit data formats. if such formats are desired, the dspcpu can be used to convert from 16-bit linear data to a-law or -law data. figure 8-5. ai status/control field mmio layout. mmio_base offset: ai_status (r/w) 0x10 1c00 ai_ctl (r/w) 0x10 1c04 ai_serial (r/w) 0x10 1c08 sckdiv ai_framing (r/w) 0x10 1c0c ai_freq (r/w) 0x10 1c10 ai_base1 (r/w) 0x10 1c14 frequency buf1_active ai_base2 (r/w) 0x10 1c18 base2 ai_size (r/w) 0x10 1c1c size (in samples) 31 0 3 7 11 15 19 23 27 validpos base1 overrun hbe (highway bandwidth error) buf2_full reset cap_enable cap_mode sign_convert little_endian 0 diagmode ovr_inten hbe_inten buf2_inten buf1_inten ack_ovr ack_hbe ack2 ack1 wsdiv ser_master datamode framemode polarity leftpos rightpos sspos 0 0 0 0 0 0 0 0 0 0 0 buf1_full sleepless clock_edge 0 0 0 0 0 0 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 reserved table 8-7. ai mmio dma control fields field name description little_endian 0 ? capture in big endian memory format (reset default) 1 ? capture little endian base1 base address of buffer1; a 64-byte aligned address in local sdram. reset default 0. base2 base address of buffer2; a 64-byte aligned address in local sdram. reset default 0. size ? number of samples to be placed in buffer before switching to other buffer ? stereo modes: a pair of 8- or 16-bit data is 1 sample ? mono modes: a single value is 1 sample ? reset default 0. cap_mode 00 ? mono (left adc only), 8 bits/sample. (reset default). 01 ? stereo, 2 times 8 bits/sample 10 ? mono (left adc only), 16 bits/sample 11 ? stereo, 2 times 16 bits/sample sign_convert 0 ? leave msb unchanged (reset default) 1 ? invert msb
pnx1300/01/02/11 data book philips semiconductors 8-6 preliminary specification 8.7 audio in operation figure 8-5 , table 8-8 and table 8-9 describe the func- tion of the control and status fi elds of the ai unit. to en- sure compatibility with future devices, undefined bits in mmio registers should be ignored when read, and writ- ten as ?0?s. the ai unit is reset by a pnx1300 hardware reset, or by writing 0x80000000 to the ai_ctl register. upon re- set, capture is disabled (cap_enable = 0), and buffer1 is the active buffer (buf1_active=1). a mini- mum of 5 valid ai_sck clock c ycles is required to allow internal ai circuitry to st abilize before en abling capture. this can be accomplished by programming ai_freq and ai_serial and then delaying for the appropriate time interval. programing of the ai_serial mmio register needs to follow the following sequence order: ? set ai_freq to ensure that a valid clock is gener- ated (only when ai is the master of the audio clock system) ? mmio(ai_ctl) = 1 << 31; /* software reset */ ? mmio(ai_serial) = 1 << 31; /* sets serial-master mode, starts ai_sck */ ? mmio(ai_serial) = (1 << 31) | (sckdiv value); /* then set divider values */ the dspcpu initiates capture by providing two equal size empty buffers and putting their base address and size in the base n and size registers. once two valid (lo- cal memory) buffers are assigned, capture can be en- abled by writing a ?1? to ca p_enable. the ai unit hard- ware now proceeds to fill buffer 1 with input samples. once buffer 1 fills up, buf 1_full is asserted, and cap- ture continues without interruption in buffer 2. if buf1_inten is enabled, a source 11 interrupt re- quest is generated. table 8-8. ai mmio control fields field name description reset the ai logic is reset by writing a 0x80000000 to ai_ctl. this bit al ways reads as a ?0?. see section 8.7, ?audio in operation? for details on software reset. diagmode 0 ? normal operation (reset default) 1 ? diagnostic mode (see section 8.11, ?diagnostic mode? ) sleepless 0 ? participate in global power down (reset default) 1 ? refrain from participating in power down cap_enable capture enable flag. if 1, ai unit captures samples and acts as dma master to write samples to local sdram. if ?0? (reset default), ai unit is inactive. buf1_inten buffer 1 full interrupt enable. default 0. 0 ? no interrupt 1 ? interrupt (source 11) if buffer 1 full buf2_inten buffer 2 full interrupt enable. default 0 0 ? no interrupt 1 ? interrupt (source 11) if buffer 2 full hbe_inten hbe interrupt enable. default 0. 0 ? no interrupt 1 ? interrupt (source 11) if a highway bandwidth error occurs. ovr_inten overrun interrupt enable. default 0 0 ? no interrupt 1 ? interrupt (source 11) if an overrun error occurs ack1 write a ?1? to clear the buf1_full flag and remove any pending buf1_full interrupt request. this bit always reads as 0. ack2 write a ?1? to clear the buf2_full flag and remove any pending buf2_full interrupt request. this bit always reads as 0. ack_hbe write a ?1? to clear the hbe flag and remove any pending hbe interrupt request. this bit always reads as 0. ack_ovr write a ?1? to clear the overrun flag and remove any pending overrun interrupt request. this bit always reads as 0. table 8-9. ai mmio status fields (read only) field name description buf1_active ? if ?1?, buffer 1 will be used for the next incoming sample. if ?0?, buffer 2 will receive the next sample. ? 1 after reset. buf1_full ? if ?1?, buffer 1 is full. if buf1_inten is also ?1?, an interrupt request (source 11) is pending. buf1_full is cleared by writing a ?1? to ack1, at which point the ai hard- ware will assume that base1 and size describe a new empty buffer. ? 0 after reset. buf2_full ? if ?1?, buffer 2 is full. if buf2_inten is also ?1?, an interrupt request (source 11) is pending. buf2_full is cleared by writing a ?1? to ack2, at which point the ai hard- ware will assume that base2 and size describe a new empty buffer. ? 0 after reset. hbe ? highway bandwidth error. condition raised when the 64-byte internal ai buffer is not yet written to sdram when a new input sample arrives. indicates insufficient allo- cation of pnx1300 highway bandwidth for the audio sampling rate/mode. refer to chapter 20, ?arbiter.? ? 0 after reset. overrun ? overrun error occurred, i.e. the cpu did not provide an empty buffer in time, and 1 or more samples were lost. if ovr_inten is also 1, an interrupt request (source 11) is pending. the overrun flag can only be cleared by writing a ?1? to ack_ovr. ? 0 after reset. table 8-9. ai mmio status fields (read only) field name description
philips semiconductors audio in preliminary specification 8-7 note that the buffers must be 64-byte aligned, and a mul- tiple of 64 samples in size (the six lsbs of ai_base1, ai_base2 and ai_size are always ?0?). the dspcpu is required to assign a new, empty buffer to base1 and perform an ack1, before buffer 2 fills up. capture continues in bu ffer 2, until it fills up. at that time, buf2_full is asserted, and capture continues in the new buffer 1, etc. upon receipt of an ack, the ai hardware removes the re- lated interrupt request line assertion at the next dspcpu clock edge. refer to section 3.5.3, ?int and nmi (maskable and non-maskable interrupts),? for the rules regarding ack and interrupt re-enabling. the ai interrupt should always be operated in level-sensitive mode, since ai can signal multiple conditions that each need indepen- dent acks over the single internal source 11 request line. in normal operation, the dspcpu and ai hardware con- tinuously exchange buffers without ever loosing a sam- ple. if the dspcpu fails to provide a new buffer in time, the overrun error flag is raised. this flag is not affect- ed by ack1 or ack2; it can only be cleared by an explicit ack_ovr. 8.8 power down and sleepless the ai unit enters power down state whenever pnx1300 is put in global power down mode, except if the sleep- less bit in ai_ctl is set. in the latter case, the unit con- tinues dma operation and will wake up the dspcpu whenever an interrupt is generated. the ai unit can be separately powered down by setting a bit in the block_power_down register. refer to chapter 21, ?power management.? it is recommended that ai be stopped (by negating ai_ctl.cap_enable) before block level power down is started, or that sleepl ess mode is used when global power down is activated. 8.9 highway latency and hbe the ai unit uses internal buffering before writing data to sdram. the internal buffer consists of one stereo sam- ple input holding register and 64 bytes of internal buffer memory. under normal operation, the 64-byte buffer is written to sdram while the in put register receives an- other sample. this normal operation is guaranteed to be maintained as long as the highway arbiter is set to guar- antee a latency for the ai uni t that matches the sampling interval. given a sample rate f s , and an associated sam- ple interval t (in nsec), the arbiter should be set to have a latency of at most t-20 nsec. refer to chapter 20, ?ar- biter,? for information on arbiter programming. if the re- quested latency is not adequate, the hbe (highway bandwidth error) condition ma y result. this error flag gets set when the input regist er is full, the 64-byte buffer has not yet been written to memory, and a new sample arrives. table 8-10 shows the required arbiter latency settings for a number of common operating modes. the rightmost column illustrates the nature of the resulting 64-byte highway requests. is not necessary to compute arbiter settings, but they may be used to compute bus availabil- ity in a given interval. 8.10 error behavior if either an overrun or hbe error occurs, input sam- pling is temporarily halted, and samples will be lost. in case of overrun, sampling resumes as soon as the dspcpu makes one or more new buffers available through an ack1 or ack2 operation. in the case of hbe, sampling will resume as soon as the internal buffer is written to sdram. hbe and overrun are ?sticky? error flags. they will re- main set until an explicit ack_hbe or ack_ovr. 8.11 diagnostic mode diagnostic mode is entered by setting the diagmode bit in the ai_ctl register. in diagnostic mode, the ai_sck, ai_ws and ai_sd inputs of the serial-parallel converter are taken from the output pins of the pnx1300 ao unit. this mode can be used during the diagnostic phase of system boot to verify correct operation of most of the ai unit and ao unit logic circuitry. note that the inputs are truly taken from the pnx1300 ao external pins, i.e. if an external (board level) source is driving ao_sck or ao_ws, diagnostic mode is not capable of testing audio out. special care must be taken to enable diagnostic mode. the recommended way of entering diagnostic mode is: ? setup the ao unit such that an ao_sck is generated ? set diagmode bit followed by a 5 (ai_sck) cycle delay ? perform a software reset of the ai unit and immedi- ately set the diagmode bit back to ?1?. table 8-10. ai highway arbiter latency requirement examples capmode f s (khz) t (ns) max arbiter latency (nsec) access pattern stereo 16 bits/sample 44.1 22,676 22,656 1 request every 362,812 nsec stereo 16 bits/sample 48.0 20,833 20,813 1 request every 333,333 nsec stereo 16 bits/sample 96.0 10,417 10,397 1 request every 166,667 nsec
pnx1300/01/02/11 data book philips semiconductors 8-8 preliminary specification
preliminary specification 9-1 audio out chapter 9 by gert slavenburg, santanu dutta 9.1 audio out overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 audio out (ao) unit contains many fea- tures not available in the tm-1000 and the tm-1100. it has up to 8 channels, and drives up to 4 external stereo d/a converters through a flex ible bit-serial connection. it provides all signals to interface to high quality, low cost oversampling d/a converters, including a precisely pro- grammable oversampling d/a s ystem clock. the ao unit and external d/a?s together provide the following capa- bilities: ? up to 8 channels of audio output. ? 16-bit or 32-bit samples per channel. ? programmable sampling rate. ? internal or external sampling clock source. ? autonomously reads processed audio data from memory using double buffering (dma). ? supports 16-bit mono and stereo pc standard mem- ory data formats. ? supports little- and big-endian memory formats. ? provides control capability for highly integrated pc codecs such as the ad1847, cs4218 or uad1340. ? no support for connecting several d/as to one serial data output. 9.2 external interface seven pnx1300 pins are associated with the ao unit. the ao_osclk output is an accurately programmable clock output intended to be used as the master system clock for the external d/a subsystem. the other pins (ao_sck, ao_ws and ao_sdx ) constitute a flexible serial output interface. using the ao mmio registers, these pins can be configured to operate in a variety of se- rial interface framing modes, including but not limited to: ? standard stereo i 2 s (msb first, 1-bit delay from
pnx1300/01/02/11 data book philips semiconductors 9-2 preliminary specification ao_ws, left & right data in a frame). ? lsb first, with 1?16-bit data per channel. ? complex serial frames of up to 512 bits/frame. ? up to 8 channels of audio output. 9.3 summary of operation the ao unit consists of th ree major subsystems, a pro- grammable sample clock generator, a dma engine and a data serializer. the dma engine reads 16 or 32-bit samples from mem- ory using a double buffered dma approach. the dspcpu initially assigns two full sample buffers contain- ing an integral number of samples for all active channels. the dma engine retrieves samples from the first buffer until exhausted and continues from the second buffer, while requesting a new first sample buffer from the dspcpu, etc. the samples are given to the data serializer, which sends them out in a msb first or lsb first serial frame for- mat that can also contain 1 or 2 codec control words of up to 16 bits. the frame structure is highly programmable by a series of mmio fields. table 9-1. ao unit external signals signal type description ao_osclk out over sampling clock. can be pro- grammed to emit any frequency up to 40 mhz, with sub-hz resolution. intended for use as the 256 or 384f s oversampling clock by the external d/a conversion sub- system. ao_sck io ? when ao is programmed to act as a serial interface timing slave (reset default), ao_sck acts as input. it receives the serial clock from the exter- nal audio d/a subsystem. the clock is treated as fully asynchronous to the pnx1300 main clock. ? when ao is programmed to act as serial interface timing master, ao_sck acts as output. it drives the serial clock for the external audio d/a subsystem. clock frequency is a pro- grammable integral divide of the ao_osclk frequency. ao_sck is limited to 22 mhz. the sam- ple rate of valid samples embedded within the serial stream is limited by the ao_sck maximum frequency and the available highway bandwidth. ao_ws io ? when ao is programmed as the serial- interface timing slave (reset default), ao_ws acts as an input. ao_ws is sampled on the opposite ao_sck edge at which ao_sdx are asserted. ? when ao is programmed as serial- interface timing master, ao_ws acts as an output. ao_ws is asserted on the same ao_sck edge as ao_sdx. ao_ws is the word-select or frame-sync signal from/to the external d/a sub- system. each audio channel receives 1 sample for every ws period. ao_ws can be set to change on ao_osclk positive or negative edges by the clock_edge bit. ao_sd1 out serial data to stereo external audio d/a subsystem. ao_sd1 can be set to change on ao_osclk positive or nega- tive edges by the clock_edge bit. ao_sd2 out serial data to stereo external audio d/a subsystem. ao_sd2 can be set to change on ao_osclk positive or nega- tive edges by the clock_edge bit. ao_sd3 out serial data to stereo external audio d/a subsystem. ao_sd3 can be set to change on ao_osclk positive or nega- tive edges by the clock_edge bit. ao_sd4 out serial data to stereo external audio d/a subsystem. ao_sd4 can be set to change on ao_osclk positive or nega- tive edges by the clock_edge bit.
philips semiconductors audio out preliminary specification 9-3 9.4 internal clock source figure 9-1 illustrates the different clock capabilities of the ao unit. at the heart of the clock system is a square wave dds (direct digital synthesizer). the dds can be programmed to emit frequencies from approx. 1 hz to 80 mhz with a sub hertz resolution. the output of the dds is always sent to the ao_osclk output pin. this output is intended to be used as the 256f s or 384f s system clock source for oversampling d/a converters, such as the philips saa7322, or codecs such as the ad1847, cs4218, or uad1340. the pnx1300 dds frequency is set by writing to the frequency mmio register. the programmer is free to change the frequency setting dynamically, in order to adjust the outgoing audio sample rate. in atsc trans- port stream decoding, this is the method by which the system software locks audio output sample rate to the original program provider sample rate. depending on bit 31 (msb), the dds runs in one of the two following modes: ? bit 31 = 1 (standard improved mode) ? bit 31 = 0 (tm-1000 compatibility mode) 9.4.1 pnx1300 standard improved mode this mode was first availa ble in the tm-1100. in this mode, a high quality, low-jitter ao_osclk is generated. the setting of the frequency register to accomplish a given ao_osclk frequency is given by the formula: this mode, and the above formula, should be used for all new software development on pnx1300. in the improved mode the dds synthesizer maximum jit- ter can be computed as follows: example of jitter values can be found in table 9-3 . frequency ao_osclk ao_sck ao_ws div n+1 sckdiv div n+1 square wave dds 9 dspcpuclk ao_sdx parallel to serial converter 16 16 left[15:0] right[15:0] (e.g. 64 f s ) wsdiv 31 0 70 0 8 (e.g. 256 f s ) 32 ao_cc[31:0] figure 9-1. ao clock system and i/o interface ser_master table 9-2. clock system setting (f dspcpu =133 mhz) f s osclk sck frequency sckdiv 44.1 khz 256fs 64fs 2187991971 3 48.0 khz 256fs 64fs 2191574340 3 44.1 khz 384fs 64fs 2208246133 5 48.0 khz 384fs 64fs 2213619686 5 table 9-3. jitter values for common dspcpu mhz f dspcpu (mhz) jitter (nsec) f dspcpu (mhz) jitter (nsec) 143 0.777 180 0.617 166 0.669 200 0.555 frequency 2 31 f osclk 2 32 ? 9 f dspcpu ? ----------------------------- - + = jitter 1 9 f dspcpu ? ---------------------------- - =
pnx1300/01/02/11 data book philips semiconductors 9-4 preliminary specification 9.4.2 tm-1000 compatibility mode tm-1000 clock compatibility mode is provided so that tm-1000 audio software runs without changes. it should not be used for new software development, due to a 3x higher jitter. tm-1000 mode is automatically entered whenever frequency[31] = 0. in tm-1000 mode, ao_osclk frequency is set as follows: 9.5 clock system operation the output of the dds is a lways sent to the ao_osclk output pin. this output is typically used as the 256f s or 384f s system clock source for oversampling d/a convert- ers, such as the philips saa732 2, or codecs such as the ad1847, cs4218 or ud1340. ao_ws and ao_sck are sent to each external d/a con- verter in the master mode. ao_ws, the word strobe, determines the sample rate: each active channel receives one sample for each ao_ws period. ao_sck is the data bit clock. the number of ao_sck clocks in an ao_ws period is the number of data bits in a serial frame required by the attached d/a converter. ao_ws is a divider of the bit clock and is set using ws- div to control the serial frame length. the number of bits per frame is equal to wsdiv+1. there are some mini- mum length requirements for a serial frame, refer to section 9.6.1 . ao_sck and ao_ws can be configured as input or out- put, as determined by the ser_master control field. if set as output, ao_sck can be set to a divider of the dds output frequency. whether set as input or output, the ao_sck pin signal is always used as the bit clock for parallel-serial conver- sion. the ao_ws pin always acts as the trigger to start the generation of a serial fr ame. ao_ws can similarly be programmed using wsdiv to control the serial frame length. the number of bits per frame is equal to ws- div+1. the preferred us e of the clock system options is to use ao_osclk as d/a master clock, and let the d/a con- verter be a timing slave of the serial interface (ser_master=1). this is impo rtant in view of compat- ibility with future tr imedia devices, which may only sup- port the ao unit as serial interface master. some d/a converters however, like the ad1847, provide better snr properties if they are configured as serial master, with the ao unit as slave (ser_master=0). as illustrated by figure 9-1 , the internal parallel to serial converter that constructs the serial frame is oblivious to which component is timing master. 9.6 serial data framing the ao unit can generate data in a wide variety of serial data framing conventions. figure 9-2 illustrates the no- tion of a serial frame. if polarity=1, a frame starts with a positive edge of the ao_ws signal. if polarity=0, a serial frame starts with a negative edge on ao_ws. if clock_edge=0, the parallel to serial converter sam- ples ao_ws on a positive clock edge transition, and out- puts the first bit (bit 0) of a serial frame on the next falling edge of ao_sck. if clock_edge=1, the paralle l to serial converter sam- ples ao_ws on the negative edge of ao_sck, while au- dio data is output on the positive edge, i.e. the ao_sck polarity would be reversed with respect to figure 9-2 . frequency f osclk 2 32 ? 3 f dspcpu ? ----------------------------- - = sckdiv 0 255 [, ] f aosck f aoosclk sckdiv 1 + --------------------------------- - = table 9-4. ao mmio clock & interface control field name description ser_master 0 ? (reset default), the d/a subsystem is the timing master over the ao serial interface. ao_sck and ao_ws act as inputs. 1 ? pnx1300 is the timing master over the serial interface. ao_sck and ao_ws act as outputs. this mode is required for 4,6 or 8 channel opera- tion. the ser_master bit should only be changed while the ao unit is disabled, i.e. trans_enable = 0. frequency sets the clock frequency emitted by the ao_osclk output. reset default 0. sckdiv sets the divider used to derive ao_sck from ao_osclk. set to 0..255, for divi- sion by 1..256. reset default 0. wsdiv sets the divider used to derive ao_ws from ao_sck. set to 0..511 for a serial frame length of 1..512. reset default 0. 7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 frame n 0 frame n+1 31 30 frame n-1 ao_sck ao_ws ao_sdx figure 9-2. definition of se rial frame bit positions (p olarity = 1, clockedge = 0)
philips semiconductors audio out preliminary specification 9-5 every serial frame transmits a single left and right chan- nel sample, and optional codec control data to each d/a converter. the left and right sample data can be in an lsb first or msb first form, at an arbitrary serial frame bit position, and with an arbitrary length. in msb-first mode (datamode = 0), the parallel to se- rial converter sends the valu e of left[msb] in bit posi- tion leftpos in the serial frame. subsequently, bits from decreasing bit positions in the left data word, up to and including left[sspos], are transmitted in order. in lsb-first mode (datamode = 1), the parallel-to-seri- al converter sends the value of left[sspos] in bit po- sition leftpos in the serial frame. subsequent bits from the left data word, up to and including left[msb], are transmitted in order. table 9-6 . shows the transmitted bits in different modes. frame bits that do not belong to either left[msb:ss- pos] or right[msb:sspos] or a codec control field ( section 9.7, ?codec control? ) are shifted out as zero. this zero extension ensures that pnx1300 can be used in combination with d/a converters of higher precision than the actual number of transmitted bits in the current operating mode, e.g. 18-bit d/as operating with 16-bit memory data. 9.6.1 serial frame limitations due to the implementation, there is a minimum serial frame length required that is operating mode dependent. this is shown in table 9-7 . table 9-5. ao serial framing control fields field name description polarity 0 ? serial frame starts with an ao_ws negedge (reset default) 1 ? serial frame starts with an ao_ws posedge this bit should not be changed during operation of the ao uni t, i.e. only update this bit when trans_enable = 0. leftpos(9) defines the bit position within a serial frame where the first data bit of the left channel is placed. reset default ?0?. rightpos(9) defines the bit pos ition within a serial frame where the first data bit of the right channel is placed. reset default ?0?. datamode 0 ? msb first (reset default) 1 ? lsb first sspos start/stop bit position. reset default 0. note that sspos is a 5-bit fi eld, with sspos bit 4 not-adjacent. this is for backwards compati- bility in 16 bits/sample modes with tm-1000/ 1100. ? if datamode=msb first, transmission starts with the msb of the sample, i.e. bit 15 for 16 bits/sample modes or bit 31 for 32 bits/sample modes. sspos determines the bit index (0..31) in the parallel input word of the last transmitted data bit. ? if datamode=lsb first, sspos deter- mines the bit index (0 ..31) in the parallel word of the first trans mitted data bit. bits sspos up to/including the msb are trans- mitted, i.e. up to bit 15 in 16 bits/sample mode and bit 31 in 32 bits/sample mode. see table 9-6 for more information. clock_edge 0 ? the parallel to serial converter samples ao_ws on positive edges of ao_sck and outputs data on the negative edge of ao_sck (reset default). 1 ? the parallel to serial converter samples ao_ws on negative edges of ao_sck and outputs data on positive edges of ao_sck. ws_pulse 0 ? emit 50% ao_ws (reset default). 1 ? emit single ao_sck cycle ao_ws nr_chan 00 ? only ao_sd1 is active 01 ? ao_sd1 and 2 are active 10 ? ao_sd1, 2 and 3 are active 11 ? ao_sd1..sd4 are active each sd output either receives 1 or 2 chan- nels depending on trans_mode mono resp. stereo. non-acti ve channels receive 0 value samples. in mono modes, each chan- nel of a sd output receiv es identical left & right samples. see also table 9-10 . table 9-6. bits transmitte d for each memory data item s operating mode first bit last bit valid sspos values 16 bits/sample, msb-first s[15] s[sspos] 0..15 16 bits/sample, lsb-first s[sspos] s[15] 0..15 32 bits/sample, msb-first s[31] s[sspos] 0..31 32 bits/sample, lsb-first s[sspos] s[31] 0..31 table 9-7. minimum serial frame length in bits operating mode minimum serial frame length 16 bits/sample, mono 13 bits 32 bits/sample, mono 13 bits 16 bits/sample, stereo 13 bits 32 bits/sample, stereo 36 bits
pnx1300/01/02/11 data book philips semiconductors 9-6 preliminary specification 9.6.2 i 2 s serial framing example refer to figure 9-3 and table 9-8 to see how the ao unit mmio registers should be set to transmit 16 or 32 bits of stereo data via an i 2 s serial standard to an 18-bit d/a converter with a 64-bit serial frame. 9.7 codec control in addition to the left and right data fields that are gener- ated based on autonomous dma action, a serial frame generated by the ao unit can be set to contain 1 or 2 control fields up to 16 bits in length. each co ntrol field can be independently enabled/disabled by the cc1_en, cc2_en bits in ao_ctl. th e content shifted into the frame is taken from the cc1 and cc2 field in the ao_cc register. the cc1_pos and cc2_pos fields in the ao_cfc register determine the first bit position in the frame where the control field is emitted. the field is emit- ted observing the setting of datamode, i.e. lsb or msb first. the cc_busy bit in ao_status indicates if the ao unit is ready to receive ano ther cc1, cc2 value pair. writing a new value pair to ao_cc writes the value into a buffer register, and raises the cc_busy status. as soon as both cc1 and cc2 values have been copied to a shadow register in preparation for transmission, cc_busy is negated, indicating that the ao logic is ready to accept a new codec control pair. the old cc1/ cc2 data keeps being transmitted - i.e. software is not required to provide new cc1 and cc2 data. software always needs to ensure that the cc_busy sta- tus is negated before writing a new cc1, cc2 pair. by polling cc_busy, the dspcpu can emit a sequence of individual audio frames with distinct control field values reliably. this can, for example, be used during codec ini- tialization. no provision is made for interrupt driven oper- ation of such a sequence of control values; it is assumed that after initialization, the value of control fields deter- mine slow, asynchronous changing parameters such as volume. it is legal to program the control field positions within the frame such that cc1 and cc 2 overlap each other and/or left/right data fields. if two fiel ds are defined to start at the same bit position, the priority is left (highest), right, cc1 then cc2. the field with the highest priority will be emit- ted starting at the conflicting bit position. if a field f2 is de- fined to start at a bit position i that falls within a field f1 starting at a lower bit position, f2 will be emitted starting from i and the rest of f1 will be lost. any bit positions not belonging to a data or contro l field will be emitted as ?0?. table 9-8. example setup for 64-bit i 2 s framing field value explanation polarity 0 frame starts with negedge ao_ws. leftpos 0 left[msb] will go to serial frame position 0. rightpos 32 right[msb] will go to serial frame position 32. datamode 0 msb first. sspos 0 stop with left/right[0], send 0?s after. (for 32 bits/sample mode, this field could be set to 14 to ensure zeroes in all unused bit positions) clock_edge 0 ao_sdx change on negedge ao_sck wsdiv 63 serial frame length = 64. ws_pulse 0 emit 50% duty cycle ao_ws. 1 63 62 52 51 50 33 32 31 30 18 17 3 2 1 0 0 left channel data n+1 (18) left channel data n (18) right channel data n (18) 49 figure 9-3. serial frame (64 bits) of a 18-bit precision i 2 s d/a converter. ao_sck ao_ws ao_sdx table 9-9. ao mmio codec control/status fields field name description cc1 (16) the 16-bit value of cc1 is shifted into each emitted serial frame st arting at bit position cc1_pos, as long as cc1_en is asserted. cc1_pos defines the bit position within a serial frame where the first data bit of cc1 is placed. reset default 0. cc1_en 0 ? cc1 emission dis abled (reset default) 1 ? cc1 emission enabled. cc2(16) the 16-bit value of cc2 is shifted into each emitted serial frame st arting at bit position cc2_pos, as long as cc2_en is asserted. cc2_pos defines the bit position within a serial frame where the first data bit of cc2 is placed. default 0. cc2_en 0 ? cc2 emission dis abled (reset default) 1 ? cc2 emission enabled. cc_busy 0 ? ao is ready to receive a cc1, cc2 pair (reset default). 1 ? ao is not ready to receive a cc1, cc2 pair. try again in a few sck clock inter- vals.
philips semiconductors audio out preliminary specification 9-7 figure 9-4 shows a 64-bit frame suitable for use with the cs4218 codec. it is obtained by setting polarity=1, leftpos=0, rightpos=32, datamode=0, ss- pos=0, clock_edge=1, ws_pulse=1, cc1_pos = 16, cc1_en=1, cc2_pos=48, cc2_en=1. note that frames are generated (externally or internally) even when trans_enable is de-asserted. writes to cc1 and cc2 should only be done after trans_enable is asserted. the ?first? cc values will then go out on the next frame. for a summary of codec control fields see table 9-9 9.8 memory data formats the ao unit autonomously reads samples from memory in 16 or 32 bit-per-sample memory formats, as shown in figure 9-5 for some example modes. memory samples are retrieved and used as described in table 9-10 . suc- cessive samples are always read from increasing mem- ory address locations. the setting of the little_endian bit in the ao _ctl register determines the byte order of retrieved 16 or 32-bit samples. refer to appendix c, ?endian-ness,? for details on byte ordering con- ventions. ao hardware implements a double buffering scheme to ensure that there are always samples available to trans- mit, even if the dspcpu is highly loaded and slow to re- spond to interrupts. the dspcpu software assigns 2 equal size buffers by writing a base address and size to the mmio control fields described in figure 9-6 . refer to section 9.9, ?audio out operation,? for details on hard- ware/software synchronization. if sign_convert is set to one, the msb of the memory data is inverted, which is equivalent to translating from offset binary representation to two?s complement. this allows the use of an external two?s complement 16-bit d/ a converter to generate audio from 16-bit unsigned sam- ples. this msb inversion also applies to the ?0? values transmitted to non-active output channels. note that the ao hardware does not support a-law or - law eight-bit data formats. if such formats are desired, the dspcpu should be used to convert from a-law or - law data to 16-bit linear data. table 9-10. operating modes and memory formats nr_chan mode destination of successive samples 00 mono sd1.left 00 stereo sd1.left, sd1.right 01 mono sd1.left, sd2.left 01 stereo sd1.left, sd1.ri ght, sd2.left, sd2.right 10 mono sd1.left, sd2.left, sd3.left 10 stereo sd1.left, sd1.right, sd2.left, sd2.right, sd3.left, sd3.right 11 mono sd1.left, sd2.left, sd3.left, sd4.left 11 stereo sd1.left, sd1.right, sd2.left, sd2.right, sd3.left, sd3.right, sd4.left, sd4.right. figure 9-4. example codec frame layout for a crystal semi, cs4218. 1 63 62 48 47 32 31 3 2 1 0 0 left data n+1 (16) left channel data n (16) right channel data n (16) 15 cc1(16) 16 lsb lsb lsb cc2(16) lsb ao_sck ao_ws ao_sdx figure 9-5. ao memory dma formats. adr sd1.left n adr+2 sd1.right n adr+4 sd1.left n+1 adr+6 sd1.right n+1 adr+8 sd1.left n+2 adr+10 sd1.right n+2 adr+12 sd1.left n+3 adr+14 sd1.right n+3 16-bit, stereo, nr_chan=00 32-bit, stereo, nr_chan=00 sd1.left n adr sd1.right n adr+4 sd1.left n+1 adr+8 sd1.right n+1 adr+12 adr sd1.left n adr+2 sd1.right n adr+4 sd2.left n adr+6 sd2.right n adr+8 sd3.left n adr+10 sd3.right n adr+12 sd1.left n+1 adr+14 sd1.right n+1 16-bit, stereo, nr_chan=10
pnx1300/01/02/11 data book philips semiconductors 9-8 preliminary specification 9.9 audio out operation figure 9-6 , table 9-11 and table 9-12 describe the func- tion of the control and status fields of the ao unit. to en- sure compatibility with future devices, any undefined or reserved mmio bits should be ignored when read, and written as zeroes the ao unit is reset by a pnx1300 hardware reset, or by writing 0x80000000 to the ao_ctl register. the ao unit is not affected by dspcpu reset initiated through the biu_ctl register. either reset method sets all mmio fields as indicated in the tables. the timestamp counter is reset by tri_reset# or by dspcpu reset initiated through biu_ctl. it is not affect- ed by ao_ctl reset. this ensures that the timestamp counter stays synchr onous with the dspcpu cccount register. after an ao reset, 5 ao_sck clock cycles are required to stabilize the internal ci rcuitry before enabling audio out. this can be accomplished by programming the ao_freq and ao_serial registers to start ao_sck generation then waiting for the appropriate 5 ao_sck cycle interval. programing of the ao_serial mmio register needs to follow the following sequence order: ? set ao_freq to ensure that a valid clock is gener- ated (only when ao is the master of the audio clock system) ? mmio(ao_ctl) = 1 << 31; /* software reset */ figure 9-6. ao status/control field mmio layout. mmio_base offset: ao_status (r/w) 0x10 2000 ao_ctl (r/w) 0x10 2004 ao_serial (r/w) 0x10 2008 sckdiv ao_framing (r/w) 0x10 200c ao_freq (r/w) 0x10 2010 ao_base1 (r/w) 0x10 2014 frequency buf1_active ao_base2 (r/w) 0x10 2018 base2 ao_size (r/w) 0x10 201c size (in samples) 31 0 3 7 11 15 19 23 27 base1 underrun hbe (highway bandwidth error) buf2_empty reset trans_enable trans_mode sign_convert little_endian 0 udr_inten hbe_inten buf2_inten buf1_inten ack_udr ack_hbe ack2 ack1 wsdiv datamode clock_edge polarity leftpos rightpos sspos 0 0 0 0 0 0 0 0 0 0 0 sleepless buf1_empty ao_cc (r/w) 0x10 2020 ao_cfc (r/w) 0x10 2024 cc1_pos cc2_pos cc2 cc1 cc1_en cc2_en ws_pulse cc_busy nr_chan 0 0 0 0 0 0 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 reserved sspos[4] ao_tstamp (r/o) 0x10 2028 timestamp 31 0 3 7 11 15 19 23 27 ser_master
philips semiconductors audio out preliminary specification 9-9 ? mmio(ao_serial) = 1 << 31; /* sets serial-master mode, starts ao_sck */ ? mmio(ao_serial) = (1 << 31) | (sckdiv value); /* then set divider values */ upon reset, transmission is disabled (trans_enable = 0), and buffer 1 is the active buffer (buf1_active=1). the dspcpu initiates transmission by providing two full equal size buffers and pu tting their base address and size in the base n and size registers. once two valid buffers are assigned, transmission can be enabled by writing a ?1? to trans_enable. the ao hardware now proceeds to empty buffer 1 by transmission of output samples. once buffer 1 empties, buf1_empty is as- serted, and transmission continues without interruption from buffer 2. if buf1_inten is enabled, a source 12 interrupt request is generated. note that buffers must be 64-byte aligned (the six lsbs of ao_base1, ao_base2 are zero). buffer sizes must be a multiple of 64 samples (the 6 lsb?s of ao_size are zero). the dspcpu is required to assign a new, full buffer to base1 and perform an ack1 before bu ffer 2 empties. transmission continues from buffe r 2 until it is empty. at that time, buf2_empty is asserted and transmission continues from the new buffer 1, etc. an ack performs two functions: it tells the ao unit that the corresponding base register now points to a buffer filled with samples, and it clears buf_empty. upon receipt of an ack, the ao hardware removes the buf_empty related inter- rupt request line assertion at the next dspcpu clock edge. refer to the interrupt controller documentation for details on interrupt handler programming. the ao inter- rupt (source 12) should alwa ys be operated in level sensitive mode 9.10 interrupts the ao unit has a private interrupt request line to the dspcpu vectored interrupt controller. it uses src# 12 (same as tm-1000/tm-1100/tm-1300 ao). an interrupt is asserted as long as one or more of the underrun, hbe, buf1_empty or buf2_empty condition flags and the corresponding inten bit are as- serted. interrupts are sticky, i.e. an interrupt remains as- serted until the software exp licitly clears the condition flag by an ack_x action. table 9-11. ao mmio dma control fields field name description little_endian 0 ? big endian memory format (reset default) 1 ? little endian base1 base address of buffer1. must be a 64- byte aligned address in local sdram. reset default 0. base2 base address of buffer2. must be a 64- byte aligned address in local sdram. reset default 0. size dma buffer size, in samples. this number of mono samples or stereo sample pairs is read from a dma buffer before switching to the other buffer. buffer size in bytes is as follows: 16 bps, mono : 2 * size 32 bps, mono : 4 * size 16 bps, stereo : 4 * size 32 bps, stereo : 8 * size reset default 0. trans_mode 00 ? mono, 32 bits/sample. (reset default). left data and right data sent to each active output are the same. 01 ? stereo, 32 bits/sample 10 ? mono, 16 bits/sample. left data and right data are the same. 11 ? stereo, 16 bits/sample refer to table 9-10 for an explanation of how trans_mode and nr_chan map to output behavior. sign_convert 0 ? leave msb unchanged (reset default) 1 ? invert msb (not applied to c odec control fields) table 9-12. ao dma status fields (read only) field name description buf1_active ? if 1, buffer 1 will be used for the next sam- ple to be transmitted. ? if 0, buffer 2 will contain the next sample (1 after reset). buf1_empty ? if 1, buffer 1 is empty. ? if buf1_inten is also 1, an interrupt request (source 12) is asserted. ? buf1_empty is cleared by writing a ?1? to ack1, at which point the ao hardware will assume that base1 and size describe a new full buffer. ? 0 after reset. buf2_empty ? if 1, buffer 2 is empty. ? if buf2_inten is also 1, an interrupt request (source 12) is asserted. ? buf2_empty is cleared by writing a ?1? to ack2, at which point the ao hardware will assume that base2 and size describe a new full buffer. ? 0 after reset. hbe ? highway bandwidth error. ? 0 after reset. ? indicates that no data was transmitted due to inability to read the local ao buffer from sdram in time. this indicates an insufficient allocation of pnx1300 high- way bandwidth for the audio sampling rate/mode. underrun ? an underrun error has occurred, i.e. the cpu failed to provide a full buffer in time, and no samples were transmitted, although requested by the d/a converter. ? if udr_inten is also 1, an interrupt request (source 12) is pending. the underrun flag can only be cleared by writing a ?1? to ack_udr. ? 0 after reset.
pnx1300/01/02/11 data book philips semiconductors 9-10 preliminary specification 9.11 timestamp the ao_tstamp mmio register provides a 32-bit timestamp value that contains the cccount time value at which the last sample of the last dma buffer transmit- ted was sent across the sd output pin. this value is available for software inspec tion (read-only) in the inter- rupt handler for bufx_empty. the implementation involves an internal dspcpu clock cycle counter that is reset to have the same value as the dspcpu cccount register. it is guaranteed to be in sync with the 32 lsb of cccount provided that pc- sw.cs=1. 9.12 powerdown and sleepless the ao unit enters powerdown state whenever pnx1300 is put in global powerdown mode, except if the sleepless bit in ao_ctl is se t. in the latter case, the block continues dma operat ion and will wake up the dspcpu whenever an interrupt is generated. the inter- nal timestamp counter never powers down to ensure that it remains synchronous with cccount. the ao unit can be separately powered down by setting a bit in the block_power_down register. refer to chapter 21, ?power management.? if the block enters powerdown state, ao_sck, ao_sdx, and ao_ws hold their value stable. ao_osclk contin- ues to provide a d/a converte r clock. the signals resume their original transitions at the point where they were in- terrupted once the system wakes up. the external d/a converter subsystem is most likely confused by this be- havior, hence it is recommended ao unit to be stopped (by negating trans_enable) before block level pow- erdown is started, or that sleepless mode is used when global powerdown is activated. 9.13 highway latency and hbe the ao unit uses an internal 64-byte buffer as well as an output holding register that contains a single mono sam- ple or single stereo sample pair. under normal operation, the internal buffer is refreshed from sdram fast enough to avoid any missing samples, while data is being emit- ted from the holding register. if the highway arbiter is set up with an insufficient latency guarantee, the situation can arise that the 64-byte buffer is no t refilled and the holding register is exhausted by the time a new output sample is due. in that case the hbe error is raised. the last sample for each chann el will be repeated until the buffer is refreshed. the hbe condition is sticky, and can only be cleared by an explicit ack_hbe. this condition indicates an incorrect setting of the highway bandwidth arbiter. given a sample rate f s , and an associated sample inter- val t (in ns), the arbiter should be set to have a latency of at most t-20 ns for all modes. the latency for 4,6 and 8 channel modes can be computed as if the system is op- erating in stereo mode with a 2x, 3x respectively 4x sam- ple rate. table 9-14 shows the required arbiter latency settings for a number of common operating modes. the right most column in illustrate s the nature of the resulting 64-byte highway requests. is not necessary to compute arbiter settings, but they may be used to compute bus availabil- ity in a given interval. refer to chapter 20, ?arbiter,? for information on arbiter programming. table 9-13. ao mmio control fields field name description reset resets the audio-out logic. see section 9.9, ?audio out operation? for a descrip- tion of the recommended procedure. trans_enable transmission enable flag. 0 ? (reset default) ao inactive. 1 ? ao transmits samples and acts as dma master to read samples from local sdram. do not change the polarity bit while transmission is enabled. sleepless 0 ? (power up default) ao goes into power-down mode if pnx1300 goes to global powerdown mode. 1 ? ao continues operation when pnx1300 goes to global powerdown mode. samples are read from mem- ory as needed, and ao interrupts, when enabled, will wake up the dspcpu. buf1_inten buffer 1 empty interrupt enable. 0 ? (default) no interrupt 1 ? interrupt (source 12) if buffer 1 empty buf2_inten buffer 2 empty interrupt enable. 0 ? (default) no interrupt 1 ? interrupt (source 12) if buffer 2 empty hbe_inten hbe interrupt enable. 0 ? (default) no interrupt 1 ? interrupt (source 12) if a highway bandwidth error occurs. udr_inten underrun interrupt enable. 0 ? (default) no interrupt 1 ? interrupt (source 12) if an underrun error occurs ack1 ? write a 1 to clear the buf1_empty flag and remove any pending buf1_empty interrupt request. ? ack1 always reads 0. ack2 ? write a 1 to clear the buf2_emptyflag and remove any pending buf2_empty interrupt request. ? ack2 always reads 0. ack_hbe ? write a 1 to clear the hbe flag and ? remove any pending hbe interrupt request. ? ack_hbe always reads as 0. ack_udr ? write a 1 to clear the underrun flag and remove any pending underrun interrupt request. ? ack_udr always reads 0.
philips semiconductors audio out preliminary specification 9-11 9.14 error behavior in normal operation, the dspcpu and ao hardware continuously exchange buffers without ever failing to transmit a sample. if the d spcpu fails to provide a new buffer in time, the underrun error flag is raised, and the last valid sample or samp le pair is repeated until a new buffer of data is assigned by an ack1 or ack2. the underrun flag is not affected by ack1 or ack2; it can only be cleared by an explicit ack_udr. if an hbe error occurs, the last valid sample or sample pair is repeated until the ao hardware retrieves a new sample buffer across the highway. table 9-14. ao highway ar biter latency requirement examples transmode f s (khz) t (ns) max. arbiter latency (ns) access pattern stereo 16 bits/sample 44.1 22,676 22,656 1 request every 362,812 ns stereo 16 bits/sample 48.0 20,833 20,813 1 request every 333,333 ns stereo 16 bits/sample 96.0 10,417 10,397 1 request every 166,667 ns 6 channel 16 bits/sample 48.0 20,833 6,924 1 request every 111,111 ns stereo 32 bits/sample 48.0 20,833 20,813 1 request every 166,667 ns 6 channel 32 bits/sample 48.0 20,833 6,924 1 request every 55,556 ns
pnx1300/01/02/11 data book philips semiconductors 9-12 preliminary specification
preliminary specification 10-1 spdif out chapter 10 by gert slavenburg, santanu dutta 10.1 spdif out overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 spdif output unit (spdo) allows gener- ation of a 1-bit high-speed serial data stream. the prima- ry application is to make sp dif (sony/philips digital in- terface) data available for use by external audio equipment. the spdo unit has the following features: ? fully compliant with iec958, for both consumer and professional applications ? supports 2-channel linear pcm audio, with 16 or 24 bits per sample ? supports one or more dolby digital(r) 6-channel data streams embedded per project 1937 ? supports one or more mpeg-1 or mpeg-2 audio streams embedded per project 1937 ? allows arbitrary, programmable, sample rates from 1 hz to 300 khz ? can output data with a sample rate independent of and asynchronous to the sample rate of the audio out (ao) unit ? hardware performs autonomous dma of memory resident iec958 sub-frames ? hardware performs parity generation and bi-phase mark encoding ? allows software to have fu ll control over all data con- tent, including user and channel data alternate use of the spdo unit to generate a general- purpose high-speed data stream is possible. potential applications include use as a high-speed uart or high speed serial data channel. in this case features are: ? up to 40 mbit/sec data rate ? full software control over each bit cell transmitted ? lsb first or msb first data format 10.2 external interface the external interface consis ts of only one pin, spdo, which is described in table 10-1 . an external circuit (see figure 10-1 ) is required to pro- vide an electrically isolated output and convert the 3.3 v output pin to a drive level of 0.5 v peak-peak into a 75- ohm load, as required for consumer applications of iec- 958. 10.3 summary of operation in both spdif and transparent dma modes, spdo sends alternating memory data buffers out across the output pin. software initially gives spdo two memory data buffers and enables the spdo unit. when the first buffer is sent, spdo requests a new buffer from software while switching over to use the other buffer, etc. trans- mission continues uninterrupted until the unit is disabled. 10.3.1 spdif mode spdif driver software asse mbles spdif data in each memory data buffer. each memory data buffer consists of groups of 32-bit words in memory. each word de- scribes the data to be transmitted for a single iec-958 sub-frame, including what type of preamble is to be in- cluded. each sub-frame is transmitted in 64-clock cycle intervals of the spdo clock, a programmable clock gen- erated by the spdo direct digital synthesizer (dds). 10.3.2 transparent dma mode in transparent dma mode, software prepares each data bit exactly as it is to be transmitted, in a series of 32-bit words in each memory data buffer. each 32-bit word is table 10-1. spdo external signals signal type description spdo i/o spdif output. self clocking interface carrying either 2-channel pcm data with samples up to 24 bits, or encoded dolby ac-3(r) or mpeg audio data for decod- ing by an external audio component. figure 10-1. external spd if interface circuitry 10 uf 240e 110e transformer 1:1 1.5 - 7 mhz rca phono spdo pnx1300
pnx1300/01/02/11 data book philips semiconductors 10-2 preliminary specification transmitted lsb first or msb first in 32-clock cycle inter- vals of the spdo clock, a programmable clock generat- ed by the spdo direct digital synthesizer. 10.4 iec-958 serial format figure 10-2 shows the serial format layout of a iec-958 block. a block starts with a special ?b? pre-amble, and consists of 192 frames. the sample-rate of all embedded audio data is equal to the frame rate. each frame con- sists of 2 sub-frames. sub-fr ame 1 always starts with a ?m? pre-amble, except for sub-frame 1 in frame 0, which starts with a ?b?. sub-frame 2 always starts with a ?w? pre- amble. when iec-958 data carries 2-channel pcm data, one audio sample is transmitted in each sub-frame, ?left? in sub-frame 1 and ?right? in sub-frame 2. each sample can be 16 or 24 bits in length, where the msb is always aligned with bit slot 28 of the sub-frame. in case of more than 20 bits/sample, the aux field is used for the 4 lsbs. when iec-958 data carries non-pcm audio, such as 1 or more streams of dolby ac-3 encoded data and/or mpeg audio, each sub-frame carries 16-bit data. the data of successive frames adds up to a payload data-stream which carries its own burst-data.this is described in [2]. programmers should refer to the iec-958 documents [1] and project 1937 document [2] for a precise description of the required values in ea ch field for different types of consumer equipment. a complete discussion of this is- sue is outside the scope of this document. the spdo block hardware only concerns itself with gen- erating b, w and m preambles as well as generating the p (parity) bit. all other bits in the sub-frame are complete- ly determined by software and copied verbatim from memory to output, subject only to bit-cell coding. the programmer must construct valid iec-958 blocks by constructing the right sequence of 32-bit words as de- scribed in section 10.7, ?iec-958 memory data format.? 10.5 iec-958 bit cell and pre-amble each data bit in iec-958 is transmitted using bi-phase mark encoding. in bi-phase mark encoding, each data bit is transmitted as a cell cons isting of two consecutive bi- nary states. the first state of a cell is always inverted from the second state of the previous cell. the second state of a cell is identical to the first state if the data bit value is a ?0?, and inverted if the data bit value is a ?1?. pre-ambles are coded as bi-phase mark violations, where the first state of a cell is not the inverse of the last state of the previous cell. the duration of each state in a cell is called a ui (unit in- terval), so that each cell is 2 ui?s long. in spdo, the length of a ui is 1 spdo clock cycle as determined by figure 10-2. serial format of a iec958 block sub-frame 1 m sub-frame 2 w sub-frame 1 b sub-frame 2 w sub-frame 1 m sub-frame 2 w start of block (indicat ed by unique b pre-amble) sub-frame sub-frame frame 0 frame 1 sub-fra m m frame 191 0 31 28 24 20 16 12 8 4 sample data l s b m s b b, w or m pre-amble aux. vucp validity flag user data channel status parity bit sub-frame (2 channel pcm) 0 31 28 24 20 16 12 8 4 16-bit data l s b m s b b, w or m pre-amble vucp validity flag user data channel status parity bit sub-frame (non-pcm audio) unused (0)
philips semiconductors spdif out preliminary specification 10-3 the settings of the dds (see section 10.8, ?sample rate programming? ). figure 10-3 illustrates the transmi ssion format of 8-bit data value ?10011000?, as well as the transmission for- mat of the 3 pre-ambles. note that each pre-amble al- ways starts with a rising edge. this is made possible thanks to the presence of the parity bit, which always guarantees an even number of ?1? bits in each sub-frame. 10.6 iec-958 parity the parity bit, or p bit in figure 10-2 , is computed by the spdo hardware. the p bit value should be set such that bit cells 4 to 31 inclusive co ntain an even number of ?1?s (and hence even number of ?0?s). the p bit is bi-phase mark encoded using the same method as for all other bits. 10.7 iec-958 memory data format the dspcpu software must prepare a memory data structure that instructs th e spdo hardware to generate correct iec-958 blocks. this data structure consists of 32-bit words with th e following content: the data structure for a block consists of 384 of these 32- bit descriptor words, one for each subframe of the block, with the correct b, m, w values. all data content, includ- ing the u, c and v flag are fully under control of the soft- ware that builds each block. a dma buffer handed to the hardware is required to be a multiple of 64 bytes in length. it can contain 1 or more complete blocks, or a block may straddle dma buffer boundaries. the 64-byte length will result in dma buffers that contain a multiple of 16 sub-frames. note that the descriptor structure is a 32-bit word memo- ry data structure, and is hence subject to processor en- dian-ness. to allow software to be efficient in both little- endian and big-endian operation, the spdo block spdo_ctl register has an endian-ness bit ?little_endian?. the spdo block performs byte swapping when loading the spdif descriptors as fol- lows. ? if little_endian = 1, 32-bit words at address ?a? will be assembled from byte s (a+3,a+2,a +1,a), with the byte at ?a+3? containing the msb?s and the byte at ?a? the lsb?s. ? if little_endian = 0, 32-bit words at address ?a? will be assembled from byte s (a,a+1,a+2,a+3), with the byte at ?a? containing the msb?s and the byte at ?a+3? the lsb?s. 10.8 sample rate programming in he spdo unit, the frame rate always equals f s , the sample rate of embedded audio. this relation holds for pcm as well as for dolby ac-3 and mpeg encoded au- dio. each frame consists of 128 unit intervals (ui?s). the length of a ui is determined by the frequency setting of the dds (direct digital synt hesizer) in the spdo block. the dds can be programmed to emit frequencies from approx. 1 hz to 80 mhz in st eps of approx. 0.3 hz, with a jitter of approx. 750 psec (at dspcpu frequency of 143 mhz, see equations below). programming is accomplished through the frequen- cy mmio register: the relation between frequency register value, dspcpu clock value and synthesized fre- quency is: putting equation 1 and 2 above together yields the for- mula for setting frequency to accomplish a given sample rate: the dds synthesizer maximum jitter can be computed as follows: table 10-2. spdif sub-frame descriptor word bits definition 31 (msb) this bit must be a ?0? for future compatibility 30..4 data value for bits 4..30 of the subframe, exactly as they are to be transmi tted. hardware will per- form the bi-phase mark encoding and parity gen- eration. 3..0 (lsb) 0000 - generate a b preamble 0001 - generate a m preamble 0010 - generate a w preamble 0011 .. 1111 reserved for future figure 10-3. bi-phase mark data transmission ?1? ?0? ?0? ?1? ?1? ?0? ?0? ?0? ui cell bi-phase mark violation b bi-phase mark violation m bi-phase mark violation w f s f dds () 128 ---------------- = eq. 1 frequency 2 31 f dds 2 32 ? 9 f dspcpu ? ---------------------------- - + = eq. 2 frequency 2 31 f s 2 39 ? 9 f dspcpu ? ---------------------------- - + =
pnx1300/01/02/11 data book philips semiconductors 10-4 preliminary specification table 10-3 shows settings for common sample rate and dspcpu clock combinations: the programmer is free to change frequency, and hence the system sample rate to perform long-term tracking of any absolute timing source and/or control software buffer fullness. changes to the frequency register pull-in or delay the next clock edge and have no instantaneous effect on clock le vel, i.e. the rate of phase progression is changed, not the phase. 10.9 transparent mode when spdo is set to operate in transparent mode, it takes all 32 bits of the memory data and shifts them out verbatim, without bi-phase mark encoding, parity gener- ation, or preamble. two transparent modes are provided, as determined by trans_mode in spdo_ctl: lsb first and msb first. one bit of memory data is transmitted for each dds clock, such that the freq uency register value for a desired bitrate is given by the following equation: the 32-bit memory word is constructed according to the same rules for little_endian as in section 10.7, ?iec-958 memory data format.? 10.10 dma operation before enabling the spdo block, software must assign two buffers with data to spdo_base1, spdo_base2, and spdo_size (buffer size in bytes). each memory buffer size must be a multiple of 64 bytes regardless of the operating mode. the spdo block is enabled by writing a ?1? to spdo_ctl.trans_enable. on ce enabled, the first dma buffer is sent out at the programmed sample rate. once the first buffer is empty, buf1_active is negated, a timestamp is generated (see section 10.13, ?times- tamps? ) and the buf1_empty flag in spdo_status is asserted. if buf1_inten in spdo_ctl is also as- serted, an interrupt to the dspcpu is generated. the spdo block continues emitting the data in dma buffer 2. in normal operation, the dspcpu assigns a new buffer 1 full of data to spdo and signals this by writing a ?1? to ack_buf1. the spdo block immediately negates the buf1_empty condition and the related interrupt re- quest. once buffer 2 is empty, similar signaling occurs and the hardware switches back to using buffer 1. 10.11 dma error conditions two types of error can occur during dma operation. if the software fails to provide a new buffer of data in time, and both dma buffers empty out, the spdo hard- ware raises the underrun flag in spdo_status. transmission switches over to the use of the next buffer, but the data transmitted is incorrect. if udr_inten is asserted, an inte rrupt will be generated. the under- run flag is sticky, i.e. it will remain asserted until the software clears it by writing a ?1? to ack_udr. a lower level error can also occur when the limited size internal buffer empties out before it can be refilled across the highway. this situation ca n arise only if insufficient bandwidth has been requested from the highway. in this case, the hbe error flag is raised. refer to section 10.17, ?hbe and highway latency? for a description of how to set the arbiter latency correctly. 10.12 interrupts the spdo block uses interrup t src num 25, with inter- rupt vector mmio offset 0x1008e4. it is highly recommended that the interrupt be operated in level-sensitive mode only. the spdo block generates an in terrupt if one of the fol- lowing status bit flags, and its corresponding inten_xxx flag are set: buf1_empty, buf2_empty, hbe, un- derrun. all these status flags are sti cky, i.e. they are asserted by hardware when a certain condition occurs, and remain set until the interrupt handler explicitly clears them by writing a ?1? to the corres ponding ack bit in spdo_ctl. the spdo hardware takes the flag away in the clock cy- cle after the ack is received. this allows immediate re- turn from interrupt once performing an ack. 10.13 timestamps any outgoing dma buffer is assigned a 32-bit ?time of de- parture? timestamp. the co unter used to generate times- tamps uses the dspcpu clock and the same reset time as the dspcpu cccount regi ster, resultin g in a value that corresponds to the 32 lsb?s of cccount - provid- ed that pcsw.cs=1, i.e. t he real cccount counter in- crements on every clock cycle. table 10-3. spdif sample rate setting f s (khz) f dspcpu (mhz) frequency (hexadecimal) ui (nsec) jitter (nsec) 32.000 143 0x80d0,9316 244.14 0.777 32.000 166 0x80b3,acf8 244.14 0.669 32.000 180 0x80a5,b36e 244.14 0.617 44.100 143 0x811f,711b 177.15 0.777 44.100 166 0x80f7,9d93 177.15 0.669 44.100 180 0x80e4,5b47 177.15 0.617 48.000 143 0x8138,dca1 162.76 0.777 48.000 166 0x810d,8375 162.76 0.669 48.000 180 0x80f8,8d25 162.76 0.617 jitter 1 9 f dspcpu ? ---------------------------- - = frequency 2 31 2 32 bitrate ? 9 f dspcpu ? ------------------------------ + = eq. 2
philips semiconductors spdif out preliminary specification 10-5 the timestamp can be read in the dma interrupt handler as mmio register spdo_tstamp. its contents corre- sponds to the (synchronized) clock edge at which the last bit in the dma buffer was sent across the output signal pin. 10.14 mmio register description figure 10-4. spdo unit stat us/control field mmio layout. mmio_base offset: spdo_status (r/ 0x10 4c00 spdo_ctl (r/w) 0x10 4c04 spdo_freq (r/w) 0x10 4c08 spdo_base1 (r/w) 0x10 4c0c frequency buf1_active spdo_base2 (r/w) 0x10 4c10 base2 spdo_size (r/w) 0x10 4c14 size (in bytes) 31 0 3 7 11 15 19 23 27 base1 underrun hbe (highway bandwidth error) buf2_empty reset trans_enable trans_mode little_endian 0 udr_inten hbe_inten buf2_inten buf1_inten ack_udr ack_hbe ack_buf2 ack_buf1 0 0 0 0 0 0 0 0 0 0 0 sleepless buf1_empty 0 0 0 0 0 0 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 spdo_tstamp (r/o) 0x10 4c18 timestamp table 10-4. spdo_status mmio register field type description buf1_empty r/o sticky flag - set if dma buffer 1 emp- tied by the spdo hardware. can only be cleared by software write to ack_buf1. buf2_empty r/o sticky flag - set if dma buffer 2 emp- tied by the spdo hardware. can only be cleared by software write to ack_buf2. hbe r/o highway bandwidth error. sticky flag - set if internal spdo buffers emptied before new data brought from mem- ory. refer to section 10.17, ?hbe and highway latency.? can be cleared only by a software write to ack_hbe. underrun r/o sticky flag - set if both dma buffers were emptied before a new full buffer was assigned by the dspcpu. the hardware has performed a normal buffer switch over and is emitting old data. can only be cleared by software write to ack_udr. buf1_active r/o flag - set if the hardware is currently emitting dma buffer 1 data; negated when emitting dma buffer 2 data. table 10-5. spdo_ctl mmio register field type description ack_buf1 w/o always reads as ?0?. write a ?1? here to clear buf1_empty. this informs spdo that dma buffer 1 is now full. writing a ?0? has no effect. ack_buf2 w/o always reads as ?0?. write a ?1? here to clear buf2_empty. this informs spdo that dma buffer 2 is now full. writing a ?0? has no effect. ach_hbe w/o always reads as ?0?. writing a ?1? here clears hbe. ack_udr w/o always reads as ?0?. writing a ?1? here clears underrun. buf1_inten r/w if buf1_empty asserted and this bit asserted, the src 25 interrupt line is asserted. table 10-4. spdo_status mmio register field type description
pnx1300/01/02/11 data book philips semiconductors 10-6 preliminary specification to ensure compatib ility with future devices, any unde- fined mmio bits should be ignored when read, and writ- ten as ?0?s. the spdo_freq register determines the frequency of operation of the dds, and hence the sample rate of out- going audio. refer to section 10.8, ?sample rate pro- gramming.? and section 10.9, ?transparent mode.? spdo_base1 contains the me mory address of dma buffer 1. spdo_base2 contai ns the memory address of dma buffer 2. spdo_size dete rmines the size, in bytes, of both dma buffers. as signment to spdo_base1, spdo_base2 and spdo_size ha ve no effect on the state of the spdo_status flags; the ack_buf1 and ack_buf2 bits signal the assignment of valid data to the dma buffers. any cha nge to the base register should only be done to an inactive buffer and should pre- cede the ack to that buffer. spdo_tstamp is a read-only register containing the cycle count at which the last bit from the last emptied buffer was transmitted across the output pin. refer to section 10.13, ?timestamps.? 10.15 reset the spdo block is reset by global pnx1300 reset pin tri_reset# or by writing a ?1? to the reset bit in spdo_ctl. the spdo block is not affected by dspcpu reset initiated tho ugh the pci block biu_ctl register. either reset method sets the spdo block in the following state: ? spdo_base1, spdo_base2, spdo_size = 0 ? spdo_status: all defined fields set to ?0?, except buf1_active = 1 ? spdo_ctl all defined fields set to value 0 the spdo block timestam p counter is reset by tri_reset# or by dspcpu reset initiated through biu_ctl, so as to ensure t hat it stays synchronous to the cccount dspcpu register. 10.16 power down and sleepless the spdo block enters powerdown state whenever pnx1300 is put in global powerdown mode, except if the sleepless bit in spdo_ctl is se t. in the latter case, the block continues dma oper ation and will wake up the dspcpu whenever an interrupt is generated. spdo can be separately powered down by setting a bit in the block_power_down register. for a descrip- tion of powerdown, see chapter 21, ?power manage- ment.? the spdo block should not be active when applying glo- bal powerdown (trans_enable = 0), or if active, sleepless should be asserted . spdo should not be active if powered down separately. if the block enters power-down state while transmission is enabled, its operation continues from the interrupted clock cycle, but the output si gnal generated by the block has undergone a pause that is unacceptable to external equipment. 10.17 hbe and highway latency the spdo unit uses one intern al 64-byte buffer and two 32-bit holding registers. under normal operation, the in- ternal buffer is refilled from sdram fast enough to avoid missing any data, while data is being sent from the two 32-bit registers. if the highway arbiter is set up with an in- sufficient latency guarantee, the situation can arise in which the 64-byte buffer is not refilled in time. in that case the hbe error is raised, and some data has been irrevo- cably lost. the hbe condition is sticky, and can only be cleared by an explicit ack_hbe. buf2_inten r/w if buf2_empty asserted and this bit asserted, the src 25 interrupt line is asserted. hbe_inten r/w if hbe asserted and this bit asserted, the src 25 interrupt line is asserted. udr_inten r/w if underrun asserted and this bit asserted, the src 25 interrupt line is asserted. sleepless r/w if ?1?, the spdo block does not power down when pnx1300 goes into global power-dow n mode. if ?0?, the block does power down. little_endian r/w if asserted, the 32-bit data spdif descriptor word or transparent mode data word is assembled using little endian byte ordering, otherwise big-endian. trans_mode r/w ? 000 - iec-958 mode. hardware performs bi-phase mark encod- ing, preamble generation, and parity generation, and transmits one iec-958 subframe for each data descriptor word. ? 010 transparent mode, lsb first. the 32-bit data descriptor words are transmitted as is, lsb first. ? 011 transparent mode, msb first. the 32-bit data descriptor words are transmitted as is, msb first. ? any other code reserved for future extensions. the transmission mode should only be changed while transmission is disabled. trans_enable r/w writing a ?1? to this bit enables transmission per the selected mode. writing a ?0? here stops any ongoing transmission after com- pleting any actions related to the current data descriptor word. reset w/o writing a ?1? to this bit resets the spdo unit and should be used with extreme caution. ongoing trans- mission will be interrupted, receiv- ers may be left in a strange state. table 10-5. spdo_ctl mmio register field type description
philips semiconductors spdif out preliminary specification 10-7 the highway arbiter needs to be programmed such that the spdo unit?s latency requirement can always be met. refer to chapter 20, ?arbiter? for details. the required la- tency can be computed as indicated below. given an output data rate f s in samples/sec, 2x 32 bits are required each sample interval. the arbiter should be set to have a latency so that the bu ffer is ref illed before a sample interval expires. see table 10-6 for example practical settings. 10.18 literature references [1] iec-958 digital audio interface, part 1: general; part 2: professional applications ; part 3: consumer applica- tions. [2] ?interface for non-pcm encoded audio bitstreams ap- plying iec958?, philips consumer electronics, june 6 1997. iec 100c/wg11(project 1937) table 10-6. spdo block highway latency requirements f s (khz) max. latency (nsec) 32.000 31250 44.100 22675 48.000 20833
pnx1300/01/02/11 data book philips semiconductors 10-8 preliminary specification
preliminary specification 11-1 pci interface chapter 11 by gert slavenburg, ken- sue tan, babu kandimalla 11.1 pci overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 includes a pci interface for easy integration into personal computer applications?where the pci-bus is the standard for high-speed peripherals. in embedded applications, with pnx1300 serving as the main cpu, the pci bus can interface to peripheral devices that im- plement functions not provided by the on-chip peripher- als. see figure 11-1 . the main function of the pci interface is to connect the pnx1300 on-chip highway and pci buses. a bus cycle on the internal highway that targets an address mapped into pci space will cause the pci interface to create a pci bus cycle. similarly, a bus cycle on pci that targets an address mappe d into pnx1300 memory space will cause the pci interface to create a highway bus cycle targeted at sdram. for some operations, the pci inter- face is explicitly programmed by the dspcpu. from pnx1300, only the dspcpu and the image copro- cessor (icp) unit can cause the pci interface to create pci bus cycles; the other on-c hip peripherals cannot see external hardware through the pci interface. from pci, sdram and most of the registers in mmio space can be accessed by external pci initiators. the pci interface implements dma (also called block or burst) and non-dma transfers. dma transfers are inter- ruptible on 64-byte boundaries. the pci interface can service outbound (pnx1300 pci) and inbound (pci pnx1300) data flows simultaneously. table 11-1 lists some of the featur es of the pci interface. pnx1300 dma read transactions use an efficient ?mem- ory read multiple? pci transact ions, unless explicitly dis- abled. section 11.6.5 . pnx1300 contains an on-board pci_clk generator for low-cost configurations. it can be enabled/disabled at boot time. see section 13.1 on page 13-1 . pnx1300 has a sideband control signal that allows glue- less connection of simple slave peripherals directly to the pci bus wires. this can be used to connect flash, rom, sram, uarts, etc. with 8- bit data and demultiplexed addresses. refer to chapter 22, ?pci-xio external i/o bus.? pci agent pci agent pci agent pnx1300 pci bus arbiter host cpu (e.g., x86) interrupt controller pci agent pci agent pci agent pnx1300 pci bus arbiter a) pnx1300 as peripheral b) pnx1300 as host cpu pci bus pci bus pci bridge figure 11-1. two typical syste m implementations: (a) shows pnx1300 as a pci peripheral in a desktop pc, (b) shows an embedded system with pnx1300 as the host cpu. table 11-1. pci interface characteristics characteristic comments pci compliance pci local bus specification rev. 2.1 pci speed up to 33 mhz data bus width 32-bit only address space 32 bits (4 gb) voltage levels drive & receive at either 3.3 v or 5v burst mode yes, w/ double buffering so maxi- mum transfer rate (132 mb/sec) is sustainable posted write yes, can be disabled pci ?special cycle? not recognized pci ?memory write & invalidate? supported for pnx1300 as initiator pci ?interrupt acknowl- edge? not generated pci ?dual-address cycle? not generated
pnx1300/01/02/11 data book philips semiconductors 11-2 preliminary specification 11.2 pci interface as an initiator the following classes of opera tions invoked by pnx1300 cause the pci interface to act as a pci initiator: ? transparent, single-word (or smaller) transactions caused by dspcpu loads and stores to the pci address aperture ? explicitly programmed single-word i/o or configura- tion read or write transactions ? explicitly programmed mult i-word dma transactions. ?icp dma 11.2.1 dspcpu single-word loads/stores from the point of view of programs executed by pnx1300?s dspcpu, there are three apertures into pnx1300?s 4-gb memory address space: ? sdram space (0.5 to 64 mb; programmable) ? mmio space (2 mb) ? pci space mmio registers control the positions of the address- space apertures (see chapter 3, ?dspcpu architec- ture? ). the sdram aperture begins at the address spec- ified in the mmio register dram_base and extends up- ward to the address in the dram_limit register. the 2- mb mmio aperture begins at the address in mmio_base (defaults to 0xef e00000 after power-up). all addresses that fall outside these two apertures are assumed to be part of the pci address aperture. refer- ences by dspcpu loads and stores to the pci aperture are reflected to external pc i devices by the coordinated action of the data cache and pci interface. when a dspcpu load or store targets the pci aperture (i.e., neither of the other two apertures), the dspcpu?s data cache automatically carries out a special sequence of events. the data cache writes to the pci_adr and (if the dspcpu operation was a store) pci_data regis- ters in the pci interface and asserts (load) or de-asserts (store) the internal signal pci_read_operation (a direct connection from the data cac he to the pci interface). while the pci interface executes the pci bus transac- tion, the dspcpu is held in the stall state by the data cache. when the pci interface has completed the trans- action, it asserts the internal signal pci_ready (a direct connection from the pci interface to the data cache). when pci_ready is asserted, the data cache finishes the original dspcpu operation by reading data from the pci_data register (if th e dspcpu operation was a load) and releasing the dspcpu from the stall state. explicit writes to pci_adr, pci_data the pci_adr and pci_data registers are intended to be used only by the data cache. explicit writes are not al- lowed and may cause undetermined results and/or data corruption. 11.2.2 i/o operations explicit programming by dspcpu software is the only way to perform transactions to pci i/o space. dspcpu software writes three mmio re gisters in the following se- quence: 1. the io_adr register. 2. the io_data register (if pci operation is a write). 3. the io_ctl register (controls direction of data move- ment and which bytes participate). the pci interface starts the pci-bus i/o transaction when software writes to io_ctl. the interface can raise a dspcpu interrupt at the co mpletion of the i/o transac- tion (see biu_ctl register definition in section 11.6.5, ?biu_ctl register? ) or the dspcpu can poll the appro- priate status bit (see biu_ status register definition in section 11.6.4, ?biu_status register? ). note that pci i/o transactions should not be initiated if a pci config- uration transaction described below is pending. this is a strict implemen tation limitation. the fully detailed description of the steps needed can be found in section 11.6.13, ?io_ctl register.? 11.2.3 configuration operations as with i/o operations, explicit programming by dspcpu software is the on ly way to perform transac- tions to pci configuration space. dspcpu software writes three mmio registers in the following sequence: 1. the config_adr register. 2. the config_data register (if pci operation is a write). 3. the config_ctl register (c ontrols direction of data movement and which bytes participate). the pci interface starts the pci-bus configuration trans- action when software writes to config_ctl. as with the i/o operations, the biu_st atus and biu_ctl registers monitor the status of the operation and control interrupt signaling. note that pci configuration space transactions should not be initiated if a pci i/o transaction de- scribed above is pending. this is a strict implementation limitation. the fully detailed description of the steps needed can be found in section 11.6.10, ?config_ctl register.? 11.2.4 dma operations the pci interface can operate as an autonomous dma engine, executing block-transf er operations at maximum pci bandwidth. as with i/o a nd configuration operations, dspcpu software explicitly programs dma operations. general-purpose dma for dma between sdram a nd pci, dspcpu software writes three mmio registers in the following sequence: 1. the src_adr and dest_adr registers. 2. the dma_ctl register (controls direction of data movement and amount of data transferred).
philips semiconductors pci interface preliminary specification 11-3 the pci interface begins the pci-bus transactions when software writes to dma_ctl. as with the i/o and config- uration operations, the bi u_status and biu_ctl reg- isters monitor the status of the operation and control in- terrupt signaling. the fully detailed description of the steps needed to start a dma transaction can be found in section 11.6.16, ?dma_ctl register.? image-coprocessor dma the pci interface also executes dma transactions for the image coprocessor (icp). the icp performs rapid post-processing of image data and writes it at pci dma speed to a pci graphics card frame buffer. the icp can- not perform pci read transactions. biu_ctl.ie (icp dma enable) should be asserted before attempting icp pci operation. programming of icp dma is described in section 14.6, ?operation and programming.? 11.3 pci interface as a target the pnx1300 pci interface responds as a target to ex- ternal initiators for a limited set of pci transaction types: ? configuration read/write ? memory read/write, read line, and read multiple to the pnx1300 sdram or mmio apertures. see sec- tion 11.8, ?limitations.? pnx1300 ignores pci transactions other than the above. 11.4 transaction concurrency, priorities, and ordering the pci interface can be processing more than one op- eration at a given time. there are five distinct classes of operations implemented by the pci interface: 1. dspcpu load/store to pci space. 2. pci i/o read/write and pci configuration read/write. 3. general-purpose dma read/write. 4. icp dma write. 5. external-pci-agent-initiated read/write (to pnx1300 on-chip re source). if the active general-purpose dma transaction is a read, up to five transactions, one from each, can be active si- multaneously. if the active general-purpose dma opera- tion is a write, then only four transactions can be active simultaneously because general-purpose dma writes force icp dma writes to wait until the general-purpose dma completes. when a general-purpose dma write is pending, an in-progress icp dma operation is suspend- ed at the next 64-byte block boundary and waits until the completion of the dma write operation. general-purpose dma reads are interleaved with icp dma writes, so both can be active concurrently. pci single-data-phase transactions (dspcpu load/ store, i/o read/write, and co nfiguration read/write) are executed in the order they are issued to the pci inter- face. note the strict implementation limitation that pci - i/o and pci configuration transactions cannot be simul- taneously active. 11.5 registers addressed in pci configuration space since it is a pci device, pnx1300 has a set of configu- ration registers to determine pci behavior. pci configu- ration registers allow full relocation of interrupt binding and address mapping by t he system?s host processor. this relocatability of pci-sp ace parameters eases instal- lation, configuration, and system boot. the pci standard specifies a 64-byte pci configuration header region within a reserved 256-byte block. during system initialization, host system software scans the pci bus, looking for pci headers, to determine what pci de- vices are present in the system. the fields in the header region uniquely identify the pci device and allow the host to control the device in a generic way. figure 11-2 shows the layout of the configuration header region. figure 11-2 also shows the initial values for the configu- ration registers. some registers, such as device id, have hardwired values, while others are programmed by soft- ware. still others are set autom atically from the external boot rom during pnx1300?s power-up initialization. 11.5.1 vendor id register for pnx1300, the value of t he 16-bit vendor id field is hardwired to 0x1131 (philips). this va lue identifies the manufacturer of a pci device. valid vendor identifiers are assigned by the pci spec ial interest group (pci sig) to ensure uniqueness. the value 0xffff is reserved and must be returned by the host/pci bridge when an at- tempt is made to read a non-existent device?s vendor id configuration register. 11.5.2 device id register for pnx1300, the value of t he 16-bit device id field is hardwired to 0x5402. the device id is assigned by the manufacturer to uniquely identify each pci device it makes. 11.5.3 command register the 16-bit command register provides basic control over a pci device?s ability to genera te and/or respond to pci bus cycles. according to the pci specification, after re- set, all bits in this register are cleared to ?0? (except for a device that must be initially enabled). clearing all bits to ?0? logically disconnects the device from the pci bus for all accesses except co nfiguration accesses. the command register format is shown in figure 11-3 . table 11-2 summarizes the field values. note that the values listed as ?normally ta ken? are not necessarily the reset values, i.e. the command re gister is reset to all ?0?s, meaning the features are disconnected on reset. following are detailed descript ions of the command reg- ister fields.
pnx1300/01/02/11 data book philips semiconductors 11-4 preliminary specification i/o (i/o access enable). this bit controls a device?s abil- ity to respond to i/o-space accesses. a value of ?0? dis- ables pci device response; a value of ?1?enables re- sponse. this bit is hardwired to ?0? because all pnx1300 internal registers are memory mapped. ma (memory access enable). this bit controls re- sponse to memory-space ac cesses. a value of ?0? dis- ables pnx1300 response; a value of ?1? enables re- sponse. this bit is set to ?0? at power-up; software can set this bit to ?1? with a configuration write. 31 00 0normally ?0? 0 hardwired to ground s p set by software if aperture size allows p set by software 1 normally one 1 hardwired to v dd s set by hardware from boot eeprom 0 15 device id (0x5402) vendor id (0x1131) 0 04 01 000 reserved reserved 11 11 status command 0000 0 0 08 10 100 010000010 class code (0x048000) revision id (see text) 0000 0 0 000 0000 000 0 0 0c 00 000 0 bist (0x00) latency timer 0000 0 0 000 0 ppp p 00 p header type (0x00) cache line size p 10 s p s p s p s p s p 0 dram base address pppp s p s p 000 0000 000 00 00 000 0 p 14 pp ppp 0 mmio base address pppp p 0 000 0000 000 00 00 00 00 0 18, 1c, 20, 24 28 30 34, 38 3c 000 1 interrupt line 0 011 0000 000 0 p 2c s sss sss ss ss ss ss s p p p p p p p interrupt pin (0x01) min_gnt (0x03) max_lat (0x01) 0000 001 0 7 23 01010100000000100001000100110001 00 p 00 0 configuration-space address offset 0 00 000 000000000 four other base address registers 0000 0 0 000 0000 000 0 0 00 000 0 0000 0 0 000 000 reserved register 0 expansion rom base address 0 000 0000 000 00 00 00 00 0 0 two reserved registers 0 000 0000 000 00 00 00 00 0 00000000 000 00 000000 0000 0 000000 0000 0 0 s sss sss ss ss ss ss s subsystem id subsystem vendor id 00ppp00 key s prefetchable figure 11-2. pci configuration header region register layout and initial values. (all values in hex.) 15 0 command register i/o 1 ma 2 em 3 sc 4 mwi 5 vga 6 par 7 wait 8 serr# 9 fb 10 reserved figure 11-3. command register format.
philips semiconductors pci interface preliminary specification 11-5 em (enable mastering). this bit controls the pnx1300 pci interface?s ability to act as a pci master. a value of ?0? prevents the pci interfac e from initiating pci access- es; a value of ?1? allows the pci interface to initiate pci accesses. note that the em bit is autom atically set to ?1? whenever the he bit in the biu_ctl register is set to ?1? (see sec- tion 11.6.5, ?biu_ctl register? ). mastering must be en- abled for pnx1300 to serve as pci host processor. em is set to ?0? at power- up. host system software can set this bit to ?1? with a configuration write. sc (special cycle). this bit controls pci device recog- nition of special-cycle operations. a value of ?0? causes a pci device to ignore all special cycles; a value of ?1? al- lows a pci device to monitor special cycle operations. this bit is hardwired to ?0? in pnx1300. mwi (memory write and invalidate). this bit deter- mines a pci device?s ability to generate memory-write- and-invalidate commands. a value of ?1? allows a pci de- vice to generate memory-write-and-invalidate com- mands; a value of ?0? forces the pci device to use mem- ory-write commands instead. pnx1300 implements this bit. the conditions under which pnx1300 dma transac- tions generate memory-write -and-invalidate are de- scribed in section 11.6.16, ?dma_ctl register.? de- tails of operation can be found in section 11.5.7, ?cache line size register.? image coprocessor dma writes al- ways use regular memory-write transactions. vga (vga palette snoop). this bit controls how vga- compatible pci devices handle accesses to their palette registers. this bit is hardwired to ?0?. par (parity error response). this bit controls signaling of parity errors (data or ad dress). a value of ?0? causes the pci interface to ignore parity errors; a value of ?1? causes the pci interface to report parity errors on the perr# pci signal. this bit is set to ?0? at power-up; since the pci interface checks parity , software can set this bit to ?1? with a configuration write. wait (wait-cycle control). this bit controls whether or not a pci device does address/data stepping. pci devic- es that never do stepping must hardwire this bit to 0. since pnx1300 does not implement stepping, this bit is hardwired to ?0?. serr# (serr# enable). this bit enables the driver of the serr# pin (system error): a value of ?0? disables it, a value of ?1? enables it. all pci devices that have an serr# pin must implement this bit. this bit is set to ?0? after reset; it can be set to ?1? with a conf iguration write. serr# and par must both be set to ?1? to allow signaling of address parity errors on the serr# signal. fb (fast back-to-back enable). this bit controls wheth- er or not a pci master can do fast back-to-back transac- tions to different devices. a value of ?0? means fast back- to-back transactions are only allowed when the transac- tions are to the same agent; a value of ?1? means the master is allowed to generate fast back-to-back transac- tions to different agents. in itialization software will set this bit if all targets are capable of fast back-to-back transactions. in pnx1300, this bit is hardwired to ?0?. reserved. reads from reserved bits returns ?0?; writes to reserved bits cause no action. 11.5.4 status register the status register is used to record information about pci bus events. the status register format is shown in figure 11-4 . table 11-3 lists the status register fields. reserved. reads from reserved bits return ?0?; writes to reserved bits cause no action. 66m (66-mhz capable). this bit is hardwired to ?0? for pnx1300 (pci runs at 33-mhz maximum). udf (user-definable features). since the pnx1300 pci interface does not impl ement pci user-definable features, this bit is hardwired to ?0?. fbc (fast back-to-backcapable). the pnx1300 pci interface does not support fa st back-to-back capability, so this bit is hardwired to ?0?. dpd (data parity detected). since the pnx1300 pci in- terface can act as a pci bus in itiator, this bit is imple- mented. dpd is set in the init iator?s status register when: ? the par (parity-error response) bit in the command register is set, and table 11-2. field values for command register field value explanation i/o hardwired to 0 (ignore i/o space accesses) ma 0 ? no recognition of memory-space accesses 1 ? recognizes memory-space accesses em 0 ? cannot act as pci initiator 1 ? can act as pci initiator sc hardwired to 0 (ignore special cycle accesses) mwi 0 ? cannot generate memory write and invalidate 1 ? can generate memory write and invalidate vga hardwired to 0 par 0 ? ignore parity errors 1 ? acknowledge parity errors serr# 0 ? disable driver for serr# pin 1 ? enable driver for serr# pin fb 0 ? fast back-to-back only to same agent 1 ? fast back-to-back to different agents reserved write ignored; reads return 0 15 0 status register 4 5 66m 6 udf 7 fbc 8 dpd 9 10 reserved 14 sse dpe 13 rma 12 rta 11 sta devsel figure 11-4. status register format.
pnx1300/01/02/11 data book philips semiconductors 11-6 preliminary specification ? the initiator asserted perr# or detected it asserted by the target (during a write cycle). devsel (device select timing). this read-only field defines the slowest timing that will be used for the devsel# signal when pnx1300 is a target on the pci bus. table 11-4 shows the allowable encodings and mean- ings. these bits are hardwired to ?01? to indicate that pnx1300 uses a ?medium? devsel# timing. sta (signaled target abort). pnx1300?s pci interface sets this bit when it is a target device and aborts a trans- action. rta (receive target abort). pnx1300?s pci interface sets this bit when it is the initiating device and the trans- action is aborted by the target device. (all initiating devic- es must implement this bit.) rma (receive master abort). pnx1300?s pci interface sets this bit when it is the initiating device and aborts a transaction (except when the transaction is a special cy- cle). (all initiating devices must implement this bit.) sse (signaled system error). pnx1300?s pci interface sets this bit when it asserts the serr# signal. (pnx1300 can generate serr#, so this bit is implemented; devices incapable of generating se rr# need not implement sse.) dpe (detected parity error). pnx1300?s pci interface sets this bit when it detects a parity error, even if parity error handling is disabled. (the par bit in the command register enables the handling of parity errors.) 11.5.5 revision id register the value in the revision id re gister is a read only value chosen by the manufacturer to indicate product revi- sions. for the pnx1300 product family, the two msbs of the revision id indicate th e fab where the part was man- ufactured. the next two bits indicate an all-layer revision number, and the 4 lsbs indicate metal layer revisions. each all-layer revision adds 0x10 to the revision id and resets the 4 lsbs to ?0?. non-pin or -function compatible trimedia devices will use the same revision id conven- tion, but with a revised device id. 11.5.6 class code register the value in the class code register is read-only. sys- tem software uses the class code register to identify the generic function of the device, and in some cases, the class code can specify a register-level programming in- terface. class code consists of three 1-byte fields as shown in figure 11-5 . the value of the upper byte, base class code, broadly classifies the function of the device. the value of the middle byte, su bclass code, identifies the function more specifically. the value of the lower byte specifies a register-level pr ogramming interface so that device-independent software can interact with the de- vice. the meanings of the base class byte values are shown in table 11-6 . the value of base class is hardwired to 0x04 since pnx1300 is a multimedia device. currently, there are no specific register-level pr ogramming interfaces defined for multimedia devices. table 11-7 lists the defined subcla sses of multimedia de- vices. pnx1300 is both a video and audio multimedia de- vice, so its subclass value is hardwired to 0x80. table 11-3. status register fields field characteristics reserved writes ignored; reads return 0 66m pci bus speed (hardwired to 0 ? 33-mhz) udf user-definable features (hardwired to 0 ? none) fbc fast back-to-back capable (hardwired to 0 ? unsupported) dpd data parity detected devsel devsel# signal timing (hardwired to 1 ? ?medium?) sta signaled target abort rta receive target abort rma receive master abort sse signaled system error dpe detected parity error table 11-4. devsel encodings devsel meaning 00 fast 01 medium 10 slow 11 reserved table 11-5. actual revision id values value (hex) product description 0x80 tm-1300 original mask - tm1f-1.0 0x81 tm-1300 1st metal revision - tm1f-1.1 0x82 tm-1300 2nd metal revision - tm1f-1.2 0x83 pnx1300/01/02/11 3nd metal revision - tm1f- 1.3 23 0 class code programming interface base class code 15 7 subclass code figure 11-5. class-code register format.
philips semiconductors pci interface preliminary specification 11-7 11.5.7 cache line size register this field only matters when the mwi bit in configuration space is set. the value of the cache line size register specifies the host system cach e line size in units of 32- bit words. initiating devices, such as the pnx1300, that can generate memory-write -and-invalidate commands must implement this regist er. when implemented, the cache line size allows initiato rs participating in the pci caching protocol to retry burst accesses at cache-line boundaries. this register is implemented in pnx1300. in the pnx1300, pci dma performs write-and-invalidate cy- cles as per the table below. icp dma and cpu pci writes are performed using normal memory-write cycles. 11.5.8 latency timer register the value of the latency ti mer register specifies the minimum number of pci cloc k cycles the pnx1300 biu (as initiator) is allowed to own the pci bus. this register is readable and writable in pci configuration space. this register must be writabl e in any pci-initiating device that can burst more than two data phases. in the pnx1300 pci interface, the least-significant three bits are hardwired to ?0? and software can program any value into the most-significant five bits. this permits software to specify the time slice with a minimum granularity of eight pci clocks. a value of ?0? signifies maximum laten- cy, i.e. 256 pci clocks. 11.5.9 header type register the value of the header type register defines the format of words 16 through 63 in configuration space and whether or not the device contains multiple functions. figure 11-6 shows the format of header type. bit 7 of header type is ?0? for single-function devices, ?1? for multi-function devices. pnx1300 is a single-function device, so bit 7 is ?0?. table 11-9 shows the encodings of the layout field. 11.5.10 built-in self test register when implemented, the bist re gister is used to control the operation of a device?s bu ilt-in self testing capability. pnx1300 does not implement bist, so this register is hardwired to return ?0?s when read. 11.5.11 base address registers the pnx1300 pci interface implements two configura- tion space memory base address registers: dram_base and mmio_base. dram_base relo- cates pnx1300?s sdram within the system address space; mmio_base relocates the 2-mb memory- mapped i/o address aperture. the values in the base address registers determine the address map as seen by bot h the dspcpu and external pci masters. these values are normally set once, and not changed dynamically once the dspcpu operates. table 11-6. base class encodings base class (in hex) meaning 00 device was built befor e class code definitions were finalized 01 mass-storage controller 02 network controller 03 display controller 04 multimedia device 05 memory controller 06 bridge device 07 simple communications controller 08 base system peripheral 0a docking station 0b processor 0c serial bus controller 0d?fe reserved ff device does not fit any of the above classes table 11-7. subclass & programming interface fields subclass (in hex) programming interface (in hex) meaning 00 00 video device 01 00 audio device 80 00 other multimedia device table 11-8. cache line size values cache line size (binary) effect 0000,0100 write-and-invali dates are done in 4- dword, i.e. 16-byte chunks 0000,1000 write-and-invalidate in 8-dword chunks 0001,0000 write-and-invalidate in 16-dword chunks all other values only normal ?memory-write? is performed table 11-9. layout encodings layout (in hex) meaning 00 non-bridge pci device 01 pci-to-pci bridge device 7 header type 0 layout 6 mf figure 11-6. header type register format.
pnx1300/01/02/11 data book philips semiconductors 11-8 preliminary specification hardware reset in itializes dram_ base to 0x0 and mmio_base to 0xef e0,0000, after wh ich the pnx1300 boot protocol sets the final value. in standalone systems, the autonomous boot sequence is executed. in th is case, the values of dram_base and mmio_base are copied from the content of the serial boot eeprom, as described in section 13.2.2, ?initial dspcpu program load for autonomous bootstrap.? in x86 or other host-assisted platforms, the pci host as- sisted boot sequence is exec uted. in this case, the base registers are not set from the eeprom. instead, the host bios executes a scan for devices on each pci bus. dur- ing this scan, memory apertures needed by each device are determined, and a suitable base is assigned by the host bios. the details of th is process are described be- low. figure 11-7 shows the formats for dram_base and mmio_base. following are descri ptions of the register fields. m (memory). the value of the m bit indicates whether the desired resource is a me mory or pc i/o aperture. the m bit is hardwired to ?0?, indicating a memory type aperture for both th e dram_base and mmio_base registers. t (type). the value of the t field indicates the size of the base address regist er and constraints on its relocatabili- ty. table 11-10 lists the encodings and meanings of the t field. pnx1300?s pci-interface base registers are 32 bits wide and can be relocated in the 32-bit address space; thus, the value of the t field is ?00? for both dram_base and mmio_base. p (prefetchable). the value of the p bit indicates to oth- er devices whether or not the range is prefetchable. the p bit in dram_base reflec ts the dram prefetch- able attribute as set by the prefetchable bit in the boot prom (refer to table 13-5 on page 13-7 for program- ming). mmio is not prefetchable, so the p bit is hardwired to ?0? for mmio_base. being prefetchable means there are no side effects on reads, the device returns all bytes on reads regardless of the byte enables, and host bridges can merge processor writes into this range without causing errors. note: the setting of the p bit does not change the behav- ior of the cache or memory interface. it simply signals the host if the range is assumed to be prefetchable. dram/mmio base address. in x86 or other host plat- forms, the configuration space dram base address and mmio base address fields serv e two purposes. first, the host bios software can use them to determine the sizes of the sdram and mmio apertures. second, the bios can write to these fields to cause the apertures to be re- located within the pci memory address space. to determine the sizes of an aperture, the bios first writes all ?1?s (0xffffffff) to the address field. when the bios reads the field immediately after, the value re- turned will have ?0?s in all don?t- care bits and ?1?s in all re- quired address bits. required address bits form a left- aligned (i.e., starting at the msb) contiguous field of ?1?s, thus effectively specifying the size of the aperture. for example, the mmio aperture is a fixed 2-mb space. after writing all ?1?s to t he mmio base address field, a subsequent read returns the value 0xffe00000. the m, t, and p fields are all ?0? i ndicating the aperture is mem- ory (not i/o), can be relocated anywhere in a 32-bit ad- dress space, and is not prefetchable. since the aperture has 21 address bits (the position of the first ?1? bit), mmio space is a 2-mb aperture (2 21 bytes). the host bios now assigns a suitable 2-mb aligned base address by writing to the mmio_base register in configuration space. the dram aperture can range in size from 1 mb to 64 mb (but the size must be a power of 2). thus, the number of required address bits can range from 20 to 26. the ac- tual amount of sdram presen t is determined by the con- tent of the first byte of the boot eeprom, as described in section 13.4, ?detailed eeprom contents.? the pci biu uses this size to determine which of the bits marked ?sp? in figure 11-7 are writable and wh ich are set to ?0?. this causes the bios to determine the correct actual dram aperture size. table 11-10. type field encodings type meaning 00 base register is 32 bits wide; mapping can relocate anywhere in 32-bit memory space 01 base register is 32 bits wide; mapping must relocate below 1 mb in memory space 10 base register is 64 bits wide; mapping can relocate anywhere in 64-bit address space 11 reserved 31 0 dram_base m dram base address 1 2 3 t p mmio_base m t p 4 0 0 0 0 0 0 0 0 s p s p s p s p s p s p0 0 0 0 0 00 0 25 19 mmio base address 0 0 0 0 0 0 0 0 00 0 0 0 0 00 0 31 0 1 2 3 4 20 figure 11-7. base address register format.
philips semiconductors pci interface preliminary specification 11-9 11.5.12 subsystem id, subsystem vendor id register the subsystem and subsystem vendor id are new in pci rev 2.1. these fields are opti onal, but their use is highly recommended as a means to have software drivers iden- tify the board rather than the chip on the board. this register is implemented starting with pnx1300 and onwards, and replaces the ?per sonality? register function- ality in the trimedia ctc chip. the board manufacturer chooses the values of both 16 bits fields by modifying the pnx1300 boot eeprom. the location of these bits is described in section 13.4, ?detailed eeprom contents.? a legal vendor id must be obtained from the pci sig. the vendor is free to as- sign subsystem id?s. 11.5.13 expansion rom base address register the expansion rom base addr ess register is similar in purpose to the sdram and mmio base address regis- ters. this register relocates a separate memory aperture for pci devices that wish to implement additional rom. pnx1300 does not implement expansion rom; conse- quently, the least-significant bit of this register?which in- dicates whether or not pnx1300 responds to expansion rom accesses?is hardwired to ?0?. all other bits also read as ?0?s. 11.5.14 interrupt line register the value of the interrupt line register determines which input of the system interrupt controller is driven by pnx1300?s interrupt pin. as it configures the system and assigns resources, host system software writes this reg- ister to assign one of the system interrupt lines to pnx1300. 11.5.15 interrupt pin register the value of the interrupt pin register determines which interrupt pin pnx1300 uses. table 11-11 lists the possi- ble values for this register. since pnx1300 uses inta#, t he value of this register is hardwired to ?1?. 11.5.16 max_lat, min_gnt registers the value in the max_lat register specifies how often the pnx1300 pci interface needs access to the pci bus. the value in the min_gnt regi ster specifies the minimum length for a burst period on the pci bus. both of these timer values are specified as multiples of 250 ns. values of ?0? indicate that a device has no specif- ic requirements for latency and burst-length. for pnx1300, max_lat is hardwired to 0x01 (250 ns), and min_gnt is hardwired to 0x03 (750 ns). 11.6 registers in mmio space the pnx1300 pci interface c ontains 13 mmio registers; most, except the status bits in biu_status, are usually written only by the dspcpu. table 11-12 lists the sup- ported cycles sequenced by the pci interface and the registers involved in each cycle. to ensure compatibility with future devices, all undefined mmio bits should be ig- nored when read, and written as ?0?s. the mmio registers are all accessible to dspcpu soft- ware, and all but the pci_adr and pci_data registers are accessible to external pc i initiators. the facilities of pnx1300?s pci interface can be useful to external initia- tors in certain circumstances. for example: ? the pci dma engine might be useful during host- assisted boot. ? host-resident diagnostics may want to test the pci interface during boot. ? the mmio registers can be used to diagnose mal- functioning parts. note, however, that external pci initiators can access mmio registers in only one wa y: as 32-bit words on nat- urally aligned, 32-bit addresses. if any other type of ac- cess is attempted, the resu lts are undefined. also, the byte order of the external initiator and the pci interface must be the same; otherwise, the result of an access with disagreeing byte order is undefined. for easy reference, table 11-13 lists the mmio registers together with thei r offsets from mm io_base and their accessibility by the dspcpu a nd external pci initiators. figure 11-8 shows the formats of the pci interface mmio registers. the followin g are detailed descriptions of the mmio registers. 11.6.1 dram_base register the dram_base register in mmio space is a shadow copy of the dram_base regist er in pci configuration space. see section 11.5.11, ?base address registers,? for more details. this copy provides mmi o-space access to this register. the p,t and m bitfields of this mmio reg- ister are read-only. 11.6.2 mmio_base register the mmio_base register in mmio space is a copy of the mmio_base register in pci configuration space. see section 11.5.11, ?base address registers,? for table 11-11. interrupt pin encodings interrupt pin meaning 1 use interrupt pin inta# 2 use interrupt pin intb# 3 use interrupt pin intc# 4 use interrupt pin intd# all others reserved
pnx1300/01/02/11 data book philips semiconductors 11-10 preliminary specification more details. this shadow copy provides mmio-space access to this register. the p,t and m bitfields of this mmio register are read-only. 11.6.3 mmio/dram_base updates the dram_base and mmio_ base registers are not normally written through mmio; their value is determined by the boot process. though not recommended, the reg- isters are writable in mmio. special care should be exer- cised when writing these registers: ? writing to sdram_base moves the origin of any executing dspcpu program , which will cause it to fail ? writing to mmio_base mo ves devices around, and moves mmio_base an d sdram_base around ? writing to both registers in sequence requires a delay, due to the implementation. it is recommended to space such writes far apart, or iterate until the first register written to reads back with the new value before writing the second one. mmio_base offset: dram_base (r/w) 0x10 0000 mmio_base (r/w) 0x10 0400 biu_status (r/w) 0x10 3004 sdram base address mmio base address biu_ctl (r/w) 0x10 3008 pci_adr (r/w) 0x10 300c pci address pci_data (r/w) 0x10 3010 config_adr (r/w) 0x10 3014 config_data (r/w) 0x10 3018 dn error: duplicate dma_cycle config_ctl (r/w) 0x10 301c io_adr (r/w) 0x10 3020 i/o address io_data (r/w) 0x10 3024 i/o data io_ctl (r/w) 0x10 3028 src_adr (r/w) 0x10 302c dest_adr (r/w) 0x10 3030 destination address source address 31 0 3 7 11 15 19 23 27 reserved inte pci data bn configuration data dma_ctl (r/w) 0x10 3034 int_ctl (r/w) 0x10 3038 int tl ptm ptm error: duplicate io_cycle or config_cycle done busy done busy done busy done busy cr (pci clear reset) he (host enable) ie (icp dma enable) bo (burst mode off) se (byte swap enable) 0 0 rn fn be rw (read/write) be rw (read/write) d ie pci-to-sdram dma_cycle io_cycle config_cycle is sr (pci set reset) rma received master abort rta received target abort tte target timer expired t 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 rmd (read multiple disable) figure 11-8. pci interface registers accessible in mmio address space.
philips semiconductors pci interface preliminary speci fication 11-11 11.6.4 biu_status register the biu_status register holds bits that track the status of bus cycles initiated by the dspcpu and bus cycles from external devices that write into sdram.two bits of sta- tus are provided for each type of bus cycle: a busy bit and a done bit. the dspcpu can read both bits; a done bit is cleared by writing a ?1? to it. the status register also holds two error-flag bits. dspcpu software must check the busy bits to avoid is- suing a pci interface bus cycle request while a request of a similar type is in prog ress. if a bus cycle is issued while a request of similar type is in progress, the pci in- terface ignores the second command and sets the ap- propriate error bit in the status register. when the dspcpu issues either an io_cycle or config_cycle request while a previous request of either type is already in progress, the pci interface sets bit 8 in biu_status. when the dspcpu issues a dma_cycle while a previous one is alread y in progress, the pci inter- face sets bit 9 in biu_status. to reset either of the er- ror bits 8 or 9 in biu_status write a ?1? to it. rta (received target abort). this bit is set when pnx1300 initiated a transaction that was aborted by the target. to reset this bit, write a ?1? to this bit position. this bit is set simultaneous with the rta bit in the configura- tion space status register, bu t is cleared independently. rma (received master abort). this bit is set when pnx1300 initiated a transaction and aborts it. this usu- ally signals a transaction to a nonexistent device. to re- set this bit, write a ?1? to this bit position. this bit is set si- multaneous with the rma bit in the configuration space status register, but is cleared independently. tte (target timer expired). in normal operation, a read of a pnx1300 data item is performed on retry basis: pnx1300 tells the external master to retry, meanwhile it fetches the data item across the highway. this bit is set if an external master did not retry a read of a pnx1300 data item within 32768 pci clocks. the requested data is discarded. to reset this bit, wr ite a ?1? to this bit position. this is purely a software info rmation bit. no software ac- tion is required when this condition occurs, but it may in- dicate a non-compliant or defective master on the bus. 11.6.5 biu_ctl register the biu_ctl register contains bits that control miscella- neous aspects of the pci in terface operation. following are descriptions of the fields. se (swap bytes enable). this bit is initialized after reset to ?0?, which causes the pci interface to operate in its de- fault big-endian mode. writing a ?1? to se causes access- es to mmio registers over the pci interface to be made in little endian mode. bo (burst mode off). this bit is initialized to ?0?, which allows the pci interface to support burst-mode writes as a target on the pci bus. setting this bit to ?1? disables burst-mode writes. with burst mode enabled, the pci interface buffers as much data as possible into r_buffer before issuing a dis- connect to the pci initiato r. with burst mode disabled, the pci interface buffers only one data phase before is- suing a disconnect to the pci initiator. inte (interrupt enables). the bits in the inte field control the signaling of interrupts to the dspcpu for pci inter- face events. these events raise dspcpu interrupt 16 if enabled. interrupt 16 must be set up as a level triggered interrupt. table 11-14 lists the function of each inte bit. inte is initially set to ?0?s (interrupts disabled). note that the error condit ion masked by bit 6 (see sec- tion 11.6.4, ?biu_status register? ) occurs when either a config_cycle or an io_cycle is requested and a request of either type is already in progress. that is, the second table 11-12. pci mmio registers and bus cycles internal cycle registers involved mmio_cycle (mmio register r/w) all registers accessible by external pci devices mem_cycle (pci-space memory r/w) pci_adr, pci_data dma_cycle (block data transfer) src_adr, dest_adr, dma_ctl io_cycle (i/o register r/w) io_adr, io_data, io_ctl config_cycle (configuration register r/w) config_adr, config_data, config_ctl table 11-13. pci mmio register accessibility register mmio_base offset accessibility dspcpu external initiator dram_base 0x10 0000 r/w r/w mmio_base 0x10 0400 r/w r/w biu_status 0x10 3004 r/w r/w biu_ctl 0x10 3008 r/w r/w pci_adr 0x10 300c r/w ?/? pci_data 0x10 3010 r/w ?/? config_adr 0x10 3014 r/w r/w config_data 0x10 3018 r/w r/w config_ctl 0x10 301c r/w r/w io_adr 0x10 3020 r/w r/w io_data 0x10 3024 r/w r/w io_ctl 0x10 3028 r/w r/w src_adr 0x10 302c r/w r/w dest_adr 0x10 3030 r/w r/w dma_ctl 0x10 3034 r/w r/w int_ctl 0x10 3038 r/w r/w table 11-12. pci mmio registers and bus cycles internal cycle registers involved
pnx1300/01/02/11 data book philips semiconductors 11-12 preliminary specification request need not be of exactly the same type that is al- ready in progress. ie (icp dma enable). this bit is must be set to ?1? to allow the icp to write pixel data through the pci interface. if this bit is cleared to ?0?, th e icp is not allowed to use the pci interface. programming of icp dma is described in section 14.6, ?operation and programming.? he (host enable). this bit is initialized to ?0?, which pre- vents the dspcpu from serving as the host cpu in the pci system. if this bit is set to one, the enable mastering (em) bit in the pci configuration register (see section 11.5.3, ?command register? ) is also set to ?1? (since pnx1300 must be enabled to serve as a pci bus initiator to perform pci configuration). cr (pci clear reset). this bit releases the dspcpu from its reset state. the pnx1300 device driver (execut- ing on an external host cpu) sets this bit to ?1? after it completes pnx1300?s configuration. the dspcpu starts to execute the pointed by dram_base mmio register. sr (pci set reset). this bit forces the dspcpu into its reset state. writing ?1? to this bit resets the cpu; writing ?0? causes no action. the pnx1300 device driver (exe- cuting on an external host cp u) can set this bit to reset the dspcpu. this form of reset resets only cpu and in- struction cache. the dcache is not reset, nor are any peripherals. rmd (read multiple disable) . in default operating mode, the rmd bit should be se t to ?0?. in that case, the biu uses ?memory read multiple? pci transactions for biu dma, and ?memory re ad? pci transactions for dspcpu reads to pci space. if the rmd bit is set, dma transactions are forced to also use the - less efficient - memory read transactions. note that tm-1000 only used memory read transactions. 11.6.6 pci_adr register the 30-bit pci_adr register is intended to be written only by the data cache. pci_adr participates in the spe- cial two-cycle data-cache-to-pci protocol. see section 11.6.7, ?pci_data register,? for more information. only the dspcpu can write to pci_adr. external pci initiators can neither read nor write this register. dspcpu software should not wr ite to this register (by writing to pci_adr in mmio sp ace). this register is in- tended only to support the special protocol between the data cache and pci bus. an unexpected write to pci_adr via mmio space will no t be prevented by hard- ware and may result in data corruption on the pci bus. 11.6.7 pci_data register the 32-bit pci_data register is intended to be used only by the data cache. pci_data participates in the special two-cycle data-cache-to-pci protocol. the pci_data and pci_adr registers are used togeth- er by the data cache to perform a single data phase pci memory-space read or write. a read operation is trig- gered when the data cache has written the transaction address into pci_adr and asserted the internal signal pci_read_operation (a direct internal connection be- tween the data cache and pci interface). a write opera- tion is triggered when the data cache has written both pci_adr and pci_data with the signal pci_read_operation deasserted. while the pci interface is performing the pci read or write, the dspcpu is stall ed waiting for the completion of the pci transaction. when the pci transaction is com- plete, the pci interface asserts pci_ready (a direct inter- nal connection between the data cache and pci inter- face). to finish a read operation, the data cache reads the pci_data register, fo rwards the data to the dspcpu, and then unlocks the dspcpu. to finish a write, the data cache simply unlocks the dspcpu. note that, if the dspcpu atte mpts to access a non-exis- tent pci address, an rma cond ition occurs. in this case, the value in the pci_data regi ster is set to ?0?. hence, the dspcpu always reads n on-existent pci locations as ?0?. normal mmio write operations to pci_data have no ef- fect. reads return the regist er?s current value. external pci initiators can neither re ad nor write this register. 11.6.8 config_adr register the config_adr register is written by the dspcpu to set up for a configuration cycle. when pnx1300 is acting as the host cpu, it must configure devices on the pci bus. the dspcpu writes config_adr to select a con- figuration register within a specific pci device. see sec- tion 11.6.10, ?config_ctl register,? for more infor- mation on initiating configuration cycles. following are descriptions of the fields of config_adr. bn (pci bus number). the bn field (the two least-sig- nificant bits of config_adr) selects one of four possi- ble pci buses. a value of ?0? for bn means that the tar- geted device is on the pci bus directly connected to pnx1300 and that any pci-to-pci bridges should ignore the configuration address. any value for bn other than ?0? means that the targeted devi ce is on a pci bus connect- ed to a pci-to-pci bridge and that all devices directly connected to pnx1300?s local pci bus should ignore the configuration address. rn (register number). the rn field (bits 2..7 of config_adr) is used to specify one of the 64 configu- table 11-14. inte bit functions biu_ctl bit if set to ?1?, interrupt dspcpu when... 2 config_cycle done 3 io_cycle done 4 dma_cycle done 5 pci_dram write cycle done 6 second config_cycle or io_cycle requested 7 second dma_cycle requested
philips semiconductors pci interface preliminary speci fication 11-13 ration words within the target device?s configuration space. fn (function number). the fn field (bits 8..10 of config_adr) is used to specify one of up to eight func- tions of the addressed pci device. dn (device number). the dn field (bits 11..31 of config_adr) is used to select the targeted pci de- vice. each bit corresponds to one of the 21 possible pci devices on a single pci bus, i.e., each bit corresponds to the idsel signal of one pci device. only one idsel sig- nal?and, therefore, only one dn bit?can be asserted during a given configuration cycle. 11.6.9 config_data register the 32-bit config_data register is used by the dspcpu to buffer data for a configuration cycle. when pnx1300 is acting as the host cpu, it must configure the pci bus and devices. the dspcpu writes or reads config_data depending on whether it is performing a write or read to a pci device?s configuration space. see section 11.6.10, ?con fig_ctl register,? for more in- formation on initiating configuration cycles. 11.6.10 config_ctl register the dspcpu writes to config _ctl to trigger a config- uration read or write cycle on the pci bus. a pci config- uration read or write should not be performed during an ongoing pci i/o read or write. the steps involved in a d spcpu pci configuration ac- cess are: 1. wait until biu_status io_cycle.busy and config_cycle.busy are both de-asserted 2. write to config_adr as described above, and (in case of a write operation) write to config_data. 3. write to config_ctl to st art the read or write.this action sets config_cycle.busy. 4. wait (polling or in terrupt based) until config_cycle.done is asserted by the hardware. 5. retrieve the requested data in config_data (in case of a read) 6. clear config_cycle.done by writing a ?1? to it. following are descriptions of the fields of config_ctl and a discussion of how a dspcpu write to config_ctl triggers configuration cycles. be (byte enables). the be field (the four lsbs of config_ctl) determines the state of pcis 4-line c/be# bus during the data phase of a configuration cycle. since the c/be# bus signals are active low, a ?0? in a be field bit means byte participates; a ?1? in a be field bit means ?byte does not participate.? table 11-15 shows the corre- spondence between be bits and bytes on the pci bus assuming little-endian byte order . rw (read/write). the rw field (bit 4 of config_ctl) determines whether the configuration cycle will be a read or a write. table 11-16 shows the interpretation of rw. a write by the dspcpu to the config_ctl register starts a configuration cycle on the pci bus. the config_data (for a write) and config_adr regis- ters must be set up befor e writing to config_ctl. during a configuration read, the pci interface drives the pci bus with the address from config_adr and the be field from config_ctl. the returned data is buff- ered in config_data. when the data is returned, the pci interface will generate a dspcpu interrupt if the ap- propriate inte bit is set in biu_ctl. alternatively, dspcpu software can poll the appropriate ?done? status bin in biu_status. fina lly, dspcpu software reads the config_data register in mmio space to access the data returned from the configuration cycle. a write operation proceeds as for a read, except that pci data is driven from config_data during the transac- tion and no data is returned in config_data. 11.6.11 io_adr register the 32-bit io_adr register is written by the dspcpu to set up for an access to a lo cation in pci i/o space. the dspcpu writes the address of the i/o register into io_adr. see section 11.6.13, ?io_ctl register,? for more information on in itiating i/o cycles. 11.6.12 io_data register the 32-bit io_data register is used by the dspcpu to set up for an access to a lo cation in pci i/o space. the dspcpu writes or reads io _data depending on wheth- er it is performing a write or read from io space. see section 11.6.13, ?io_ctl register,? for more informa- tion on initiating i/o cycles. 11.6.13 io_ctl register the dspcpu writes to io_ctl to trigger a read or write access to pci i/o space. the function of this register is similar to that of config_ctl, and the protocol for an i/ o cycle is similar to the configuration cycle protocol. a table 11-15. be field interpretation (assumes little- endian byte ordering) be bit interpretation 00 ? byte 0 (lsb) participates 1 ? byte 0 (lsb) does not participate 10 ? byte 1 participates 1 ? byte 1 does not participate 20 ? byte 2 participates 1 ? byte 2 does not participate 30 ? byte 3 (msb) participates 1 ? byte 3 (msb) does not participate table 11-16. rw interpretation rw interpretation 0 write 1 read
pnx1300/01/02/11 data book philips semiconductors 11-14 preliminary specification pci i/o read or write should not be performed during an ongoing pci configuration read or write. the steps involved in a dspcpu pci i/o access are: 1. wait until biu_status io_cycle.busy and config_cycle.busy are both de-asserted 2. write io address to io_adr, and (in case of a write operation) write data to io_data. 3. write to io_ctl to start the read or write.this action sets io_cycle.busy. 4. wait (polling or interrupt based) until io_cycle.done is asserted by the hardware. 5. retrieve the requested data in io_data (in case of a read) 6. clear io_cycle.done by writing a ?1? to it. following are descriptions of the fields of io_ctl and a discussion of how a dspcpu write to io_ctl triggers i/ o cycles. be (byte enables). the be field (the four least-signifi- cant bits of io_ctl) determines the state of pci?s 4-line c/be# bus during the data phase of an i/o cycle. since the c/be# bus signals are active low, a ?0? in a be field bit means ?byte participates;? a ?1? in a be field bit means ?byte does not participate.? table 11-15 shows the corre- spondence between be bits and bytes on the pci bus assuming little-endian byte order . rw (read/write). the rw field (bit 4 of io_ctl) deter- mines whether the i/o cycle will be a read or a write. table 11-16 shows the interpretation of rw (0 ? write, 1 ? read). a write by the dspcpu to the io_ctl register starts an i/o cycle on the pci bus. the io_data (for a write) and io_adr registers must be set up before writing to io_ctl. during an i/o read, the pci interface drives the pci bus with the address from io_adr and the be field from io_ctl. the returned data is buffered in io_data. when the data is returned , the pci interface will gener- ate a dspcpu interrupt if the appropriate inte bit is set in biu_ctl. alternatively, dspcpu software can poll the appropriate ?done? status bit in biu_status. finally, dspcpu software reads the io_data register in mmio space to access the data re turned from the i/o cycle. a write operation proceeds as for a read, except that pci data is driven from io_data during the transaction and no data is returned in io_data. 11.6.14 src_adr register the 32-bit src_adr register is used to set the source address for a block transfer dma operation. the address in src_adr must be word (4 -byte) aligned, i.e. the 2 lsbs have to be ?0?. the content of this register during or after dma is not defined, hence it cannot be used to track progress or verify completion of a dma transaction. 11.6.15 dest_adr register the 32-bit dest_adr register is used to set the desti- nation address for a block transfer dma operation. the address is dest_adr must be word (4 byte) aligned, i.e. the 2 lsbs must be ?0?. the content of this register during or after dma is not defined, hence it cannot be used to track progress or verify completion of a dma transaction. 11.6.16 dma_ctl register a write by the dspcpu to the dma_ctl register starts a dma block transfer on the pci bus. the src_adr and dest_adr registers must be set up before writing to dma_ctl. the steps involved in a dma transfer are: 1. wait until biu_status dma_cycle.busy is de-as- serted 2. write to src_adr and dest_adr as described above 3. write to dma_ctl to star t the dma transaction.this action sets dma_cycle.busy 4. wait (polling or interrupt based) until dm a_cycle.done is asserted by the hardware 5. clear dma_cycle.done by writing a ?1? to it the fields of dma_ctl are described below. tl (transfer length). the tl field (bits 0..25 of dma_ctl) specifies the number of data bytes to be transferred during the dma operation. it must be a multi- ple of 4 bytes. the maximu m length of a dma operation is limited to 64 mb, the maximum amount of sdram supported by pnx1300. the c ontent of this field during or after a dma transaction is not defined. d (dma direction). the d field (bit 26 of dma_ctl) de- termines the direction of data movement during the block transfer. table 11-17 (shows the interpretation of the d field. t (dma transaction type). the t field (bit 27 of dma_ctl) determines the transaction type of a write, as described below. table 11-17. d interpretation d data movement direction 0 sdram pci memory space (dma write) 1 pci memory space sdram (dma read) table 11-18. t interpretation t dma write transaction type 0 memory write 1 memory write-and-invalidate
philips semiconductors pci interface preliminary speci fication 11-15 pnx1300 generates memory write-and-invalidate pci transactions if all conditions below are satisfied, other- wise it generates regular me mory write transactions: ? the mwi bit in the command register is set. ? the cache line size register is set to 4,8, or 16 32- bit words. ? the dma source address is 64 byte aligned. ? the dma destination address is cache line size aligned. ?the t bit is set pnx1300 generates ?memory read multiple? pci transac- tions for dma reads, unless the rmd (read multiple dis- able) bit is set in biu_ctl, in which case the less effi- cient ?memory read? transactions are used. during a pci sdram block transfer, the pci interface drives the pci bus with the address from src_adr. the returned data is buffered in r_buffer. the pci interface then drives the address from dest_adr and the data from r_buffer to the sdram controller. src_adr and dest_adr are incremented, the tl field in dma_ctl is decremented, and this sequence repeats until tl reaches ?0?. at the end of the pci sdram block transfer, the pci interface will generate a dspc pu interrupt if the appro- priate inte bit is set in bi u_ctl. alternatively, dspcpu software can poll the appropriate ?done? status bit in biu_status. during an sdram pci block transfer, the pci inter- face drives the address fr om src_adr to the sdram controller. the returned data is buffered in w_buffer. the pci interface then drives the address from dest_adr and the data from w_buffer to the pci bus. src_adr and dest_adr are incremented, the tl field in dma_ctl is decremented, and this sequence repeats until tl reaches ?0?. at the end of the sdram pci block transfer, the pci interface can generate a dspcpu interrupt if the appro- priate inte bit is set in bi u_ctl. alternatively, dspcpu software can poll the appropriate ?done? status bit in biu_status. 11.6.17 int_ctl register the int_ctl register contai ns three fields for setting, enabling, and sensing the four pci interrupt lines. table 11-19 shows the interpretation of the fields in int_ctl. int (interrupt bits). the int field (bits 0..3 of int_ctl) can force a pci interrupt to be signalled. ie (interrupt enable). the ie field (bits 4..7 of int_ctl) enables pnx1300 to drive pci interrupt lines. is (interrupt state). the is field (bits 8..11 of int_ctl) senses the state of the pci interrupt lines. figure 11-9 shows a conceptual rea lization of the logic used to implement the control of each intx# pin. see also section 3.6, ?pnx1300 to host interrupts.? 11.7 pci bus protocol overview pnx1300?s pci interface can generate and respond to several types of pci bus commands. table 11-20 lists the 12 possible commands and whether or not pnx1300 can generate them. table 11-21 lists the 12 possib le commands and wheth- er or not pnx1300 can respond to them. the basic transfer mechanism on the pci bus is a burst, which consists of an addr ess phase followed by one or more data phases. in pnx1300, the dspcpu and icp are the only two units that can cause pnx1300 to be- table 11-19. int_ctl bits int_ctl pci signal programming field bit int 0 inta# 0 ? deassert intx# 1 ? assert intx# (if enabled); i.e., pull intx# pin to a low logic level 1 intb# 2intc# 3 intd# ie 4 inta# 0 ? disable open-collector output to intx# 1 ? enable open-collector output to intx# 5 intb# 6intc# 7 intd# is 8 inta# reads state of intx# pin: 0 ? no interrupt asserted (intx# is high) 1 ? interrupt is asserted (intx# is low) 9 intb# 10 intc# 11 intd# table 11-20. pnx1300 pci commands as initiator pnx1300 generates pnx1300 cannot generate configuration read configuration write memory read memory read multiple memory write memory write and invalidate i/o read i/o write interrupt acknowledge special cycle dual address memory read line int x oc pci int x # ie x is x figure 11-9. conceptual realization of intx# pin con- trol logic.
pnx1300/01/02/11 data book philips semiconductors 11-16 preliminary specification come a pci-bus initiator, i. e., only the dspcpu and icp can access extern al resources. 11.7.1 single-data-phase operations when the dspcpu reads or writes pc memory, the pci transaction has only a single data phase. a typical sin- gle-data-phase read op eration is illustrated in figure 11-10 . during the first clock period, the pnx1300 asserts the frame# signal to in dicate that the transaction has begun and that an address and command are stable on ad and c/be#, respectively. pnx1300 then releases the ad bus, deasserts frame#, asserts irdy#, asserts byte enables on c/be#, and waits for the target to claim the transaction by asserting devsel#. the target asserts trdy# to signal the master that the ad bus contains st able data. the assertion of trdy# causes the initiator (pnx 1300 in this case) to sam- ple the ad bus data and deassert irdy# to complete the single-data-phase read transaction. figure 11-11 shows a typical single-data-phase write op- eration. the operation begins like a read: pnx1300 as- serts the frame# signal and drives the ad bus with the tar- get address and drives the command onto the c/be# bus. the operation continues when pnx1300 deasserts frame#, asserts irdy#, and driv es the byte enables as be- fore, but it also drives the data to be written on the ad bus. the target device asserts devsel# to claim the trans- action. eventually, the target asserts trdy# to signal that it is sampling the data on the ad bus. pnx1300 continues to drive the data on the ad bus until after the target deas- serts trdy#, which completes the write operation. 11.7.2 multi-data-phase operations as with the single-data-phase operations, dma opera- tions begin with the assertion of frame# and valid ad- dress and command information. see figure 11-12 . the target knows a burst is requested because frame# re- mains asserted when ir dy# becomes asserted. in the example timing of figure 11-12 , a fast device is re- ceiving the burst from pnx1300. the target asserts devsel# and trdy# simultaneou sly. the trdy# signal re- mains asserted while pnx1 300 sends a new word of data on each pci clock cycle. the burst operation shown is a 16-word burst transfer. since only the starting ad- dress is sent by the initiator, both initiator and target must increment source and destination addresses during the burst. the initiator signals the end of the burst of data in figure 11-12 when it deasserts frame# in clock 17. the last word (or partial word) of data is transferred in the clock cycle after frame# is deasserted. finally, the target acknowledges the last data phase by deasserting trdy# and devsel#. figure 11-13 illustrates ba ck-to-back dma burst data transfers. the icp is capable of exploiting the high band- width available with back-to-back dma operations when it is writing image data to a frame buffer on a pci video card. the timing of figure 11-13 assumes that the pci bus is granted to pnx1300 until at least the beginning of the second dma burst operation. for as long as bus owner- ship is granted to pnx1300 and the icp has queued re- quests for data tr ansfer, the pci in terface will perform back-to-back dma operations . if the target eventually becomes unable to accept more data, it signals a discon- nect on the pnx1300 pci interface. the pci interface remembers where the dma burst was interrupted and at- tempts to restart from that point after two bus clocks. table 11-21. pnx1300 pci commands as target pnx1300 responds to pnx1300 ignores configuration read configuration write memory read memory write memory write and invalidate memory read line memory read multiple i/o read i/o write interrupt acknowledge special cycle dual address pci_clk frame# ad c/be# irdy# trdy# devsel# 1234 address byte enables command data wait (ad turnaround) data transfer figure 11-10. basic single-data-phase read opera- pci_clk frame# ad c/be# irdy# trdy# devsel# 123 n address data byte enables command wait data transfer figure 11-11. basic single-data-phase write opera-
philips semiconductors pci interface preliminary speci fication 11-17 11.8 limitations 11.8.1 bus locking the pci interface does not implement lock#, sbo, and sbone pins. consequently, it is possible for both the dspcpu and external pci initia tors to write to a critical memory section simultaneousl y. software must imple- ment policies to guarantee memory coherency. 11.8.2 no expansion rom pnx1300 does not implement the pci expansion rom capability. 11.8.3 no cacheline wrap address sequence the pci interface does not implement the pci cacheline- wrap address mode for external pci initiators that ac- cess pnx1300 sdram. 11.8.4 no burst for i/o or configuration space only single-data-phase transactions to configuration and i/o spaces are supported. the byte-enable signals se- lect the byte(s) within the addressed word. 11.8.5 word-only mmio register access external initiators can access pnx1300 mmio registers only as full words. the byte-enable signals have no ef- fect on the data transferred. external initiators must read and write all four bytes of mmio registers. pci_clk frame# ad c/be# irdy# trdy# devsel# 123456 17 address byte enables 18 command data 1 data 2 data 3 data 4 data 15 data 16 data transfer data transfer data transfer data transfer data transfer data transfer data transfer figure 11-12. pci burst write operation with 16 data phases. pci_clk frame# ad c/be# irdy# trdy# devsel# 1 2 3 18 19 20 address byte enables 35 byte enables command data 1 data 15 data 16 data 17 data 31 data 32 36 data transfer data transfer data transfer data transfer data transfer data transfer figure 11-13. back-to-back pci burst write operations with 16 data phases which might be generated by the icp when writing image data to a pci-resident video frame buffer.
pnx1300/01/02/11 data book philips semiconductors 11-18 preliminary specification
preliminary specification 12-1 sdram memory system chapter 12 by eino jacobs, chris ne lson, thorwald rabeler, mohammed yousuf, luis lucas 12.1 new in pnx1300/01/02/11 ? support of 256-mbit sdrams organized in x16. the refresh counter must be changed. refer to section 12.11 for more details. ? 16-bit memory interface support in addition to the 32- bit mode of tm-1300. 12.2 pnx1300 main memory overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 connects to its local memory system with a dedicated memory bus, shown in figure 12-1 . this bus interfaces only with sdram or sgram (synchronous graphics dram with its dsf pin tied low); pnx1300 is the only master on this bus. a variety of device types, speeds, and rank 1 sizes are supported allowing a wide r ange of pnx1300 systems to be built. table 12-1 summarizes the memory system fea- tures.the memory devices can have two or four banks. the main memory interface provides all control and data signals with sufficient drive capacity for a glueless con- nection up to a 183-mhz me mory system (for pnx1302, 166 mhz otherwise) with up to two memory devices. the memory-system speed can be different from pnx1300 core speed; the ratio between the memory system clock and pnx1300 core clock is programmable. with current memory technology, pnx1300 supports a glueless memory interface of up to 64mbytes with two 4 4m 16 sdram chips (two devices with 4 banks of four million words, each 16 bits wide). pnx1300 provides also a 16- bit memory interface (in- stead of 32-bit only for tm-1300) for applications requir- ing lower cost and lower performance. the available bandwidth is then reduced by two and the latency on cache misses is increased by two for the instruction cache and by one sdram cycle for the data cache on critical word first demand. the maximum amount of memory in the 16-bit mode is 32mbytes. 12.3 main-memory address aperture pnx1300?s local main memory is just one of three aper- tures into the 4-gb address space of the dspcpu: ? sdram (0.5 to 64 mb in size), ? mmio (2 mb in size), and ? pci (any address not in sdram or mmio). mmio registers control the positions of the address- space apertures. the sdram aperture begins at the ab- solute address specified in the mmio register dram_base and extends upward to the address spec- ified in the dram_limit register. if the sdram aperture overlaps the memory hole, th e memory hole is ignored. the mmio aperture begins at the address in mmio_base, which def aults to 0xefe00000 after pow- er-up, and extends upwards 2 mb. (see chapter 3, ?dspcpu architecture,? for a detailed discussion.) all addresses that fall outside these two apertures are as- sumed to be part of the pci address aperture. 1. in this document, the term ?rank? is used to refer to a group of memory devices that are accessed together. historically, the term ?bank? has been used in this con- text; to avoid confusion, this document uses bank to re- fer to on-chip organization (sdram devices have two or four internal banks) and rank to refer to off-chip, sys- tem-level organization. table 12-1. memory system features characteristic comments data width 16 and 32 bits number of ranks four chip-sel ect signals support up to four ranks (can be used as addresses) memory size from 512 kb to 64 mb devices supported ? jedec sgram (dsf tied low) ? jedec sdram ( 4, 8, 16, 32) ? pc100/133 and later clock rate up to 183 mhz sdram speed (program- mable ratio between core clock and memory system clock) bandwidth 732 mb/s (at 183 mhz and 32-bit i/f) glueless interface ? up to 2 chips at 183 mhz (e.g., 32 mb memory with 4x1mx32 sdram) ? up to 4 chips at 166 mhz (e.g., 64 mb memory with 4x1mx32 sdram) signal levels 3.3-v lvttl
pnx1300/01/02/11 data book philips semiconductors 12-2 preliminary specification 12.4 memory devices supported all devices must have a lvttl, 3.3-v interface. table 12-2 lists the devices and organizations supported in a 32-bit memory interface. refer to section 12.8, ?address mapping,? in order to evaluate the support of 2-bank, 64-mbit devices. these devices are not widely used. hence they are not de- scribed in this document. table 12-3 lists the devices and organizations supported in a 16-bit memory interface. 12.4.1 sdram pnx1300 supports synchron ous dram chips directly. sdram has a fast, synchronous interface that permits burst transfers at 1 word pe r clock cycle. the memory in- side an sdram device is divide d into two or four banks; the sdram implements interleaved bank access to sus- tain maximum bandwidth. sdram devices implement a power down mechanism with self-refresh. pnx1300 power management takes advantage of this mechanism. pnx1300 supports only je dec-compatible sdram with two or four internal banks of memory per device. 12.4.2 sgram also supported in pnx1300 systems, sgram is essen- tially an sdram with additional features for raster graph- ics functions. the device ty pe is standardized by jedec and offered by multiple dram vendors. tying the dsf input of an sgram low makes the device operates like a standard 32-bit-wide sdram and thus compatible with the pnx1300 memory interf ace. pnx1300 is not sup- porting the new types of sgrams that have a ddr inter- face. 12.5 memory granularity and sizes pnx1300 supports a variety of memory sizes thanks to: ? many possible configurations of sdram devices ? support for up to four memory ranks the minimum memory size is 4 mb using two 2 512k 16 sdram devices on the 32-bit data bus, or 2 mb with one of these devices on a 16-bit data bus. up to two memory devices can be connected without any glue logic and without sacrificing performance. the maximum memory size with full performance is 64mb using two 4 4m 16 sdram chips on a 32-bit data bus, and 32 mb using one 4 4m 16 sdram chip on a 16-bit data bus. several memory configurations can be constructed using more devices. to do so, the frequency of the memory in- table 12-2. supported rank configurations (32-bit) device size (mbit) device(s) rank size 16 2 512k 16 sdram 4 mb 2 1m 8 sdram 8 mb 2 2m 4 sdram 16 mb 64 4 512k 32 sdram 8 mb 4 1m 16 sdram 16 mb 4 2m 8 sdram 32 b mb 128 4 1m 32 sdram 16 mb 128 1 1. limited support for a 32 -mb configuration only. 4 2m 16 sdram 32 2 mb 2. however mm_config.size may be set to 16mb (i.e. 6). refer to figure 12-10 and figure 12-11 for the two possible connection details. 256 3 3. limited support for a 64- mb configuration only. 4 4m 16 sdram 64 4 mb 4. however mm_config.size is 32 mb (i.e. 7). table 12-3. supported rank configurations (16-bit) device size (mbit) device(s) rank size 16 2 512k 16 sdram 2 mb 64 4 1m 16 sdram 8 mb 128 4 2m 16 sdram 16 1 mb 256 4 4m 16 sdram 32 2 mb figure 12-1. pnx1300 internal highway bus to the external glueless sdram interface. pnx1300 memory interface chip selects# address, clock enables, ras#, cas#, we# byte enables[3:0] clock data[31:0] cs# address, control dqm[3:0] clk dq[31:0] 33 ? sdram memory array data highway pnx1300 on-chip peripherals dspcpu 1. however mm_config.size is set to 8 mb (i.e. 5) 2. however mm_config.size is set to 8 mb (i.e. 5).
philips semiconductors sdram memory system preliminary specification 12-3 terface must be lowered to account for extra propagation delay due to the excessive loading on the interface sig- nals (see section 12.13, ?output driver capacity? ). the following rules apply to memory rank design: ? all devices in a rank must be of the same type. ? all ranks must be a power of two in size. ? all ranks must be of equal size. table 12-4 lists some examples of 32-bit memory sys- tem designs. refer to the tm-1100 databook for smaller memory con- figurations. note: ? some of these configurations may not be economi- cally attractive due to the price premium. ? ?max. mhz? refers to the memory interface/sdram speed, not the pnx1300 core operating frequency. the maximum mhz also depends on the device being used, i.e. pnx1300, pnx1311 or pnx1302. refer to section 1.9.7.10 on page 1-19 for maximum operating speeds. table 12-4 lists some example of 32-bit memory system designs. 12.6 memory system programming memory system parameters are determined by the con- tents of two configuration registers, mm_config and pll_ratios. table 12-6 describes the function of these registers, and figure 12-2 shows their formats. to ensure compat ibility with future devices, any unde- fined mmio bits should be ignored when read. mm_config and pll_ratios are loaded from the boot eeprom, as described in section 13.4, ?detailed eeprom contents.? during this boot process, the mem- ory interface is held in rese t state. after the memory in- terface is released from rese t, the contents of these reg- isters cannot be altered. these registers are visible in mmio space. they can be read, but writes have no effect. 12.6.1 mm_config register the mm_config register tells the memory interface how to use the local dram memory. the fields in this register tell the interface the rank size and the refresh rate of the memory. table 12-8 summarizes the field functions. refresh (refresh interval). the 16-bit refresh field specifies the number of memory-system clock cy- cles between refresh operat ions. the default value of this field is 1000 (0x03e8). see section 12.11, ?refresh,? for more information. bw (bus width). if set to ?0? then the memory interface data bus width is 32 bits. if se t to ?1? then the memory in- terface data bus width is 16 bits. size (rank size). the 3-bit size field specifies the size of each rank of dram. each rank must be the size spec- ified by size. the default is a rank size of 4mb. refer to table 12-7 for the interpretation of this field. table 12-4. examples of 32-bit me mory configurations size (mb) ranks rank configurations max. mhz peak mb/s 8 1 four 2 1m 8 sdram 166 664 2two 2 512k 16 sdram two 2 512k 16 sdram 166 664 1 one 4 512k 32 sdram 183 732 16 1 two 4 1m 16 sdram 183 732 1 one 4 1m 32 sdram 183 732 2 one 4 512k 32 sdram one 4 512k 32 sdram 183 732 24 3 one 4 512k 32 sdram one 4 512k 32 sdram one 4 512k 32 sdram 166 664 32 1 1 1. however mm_config.size may be 16 mb (i.e. 6). refer to figure 12-10 and figure 12-11 for the two possible connection details. two 4 2m 16 sdram 183 732 1 1 four 4 2m 8 sdram 166 664 2two 4 1m 16 sdram two 4 1m 16 sdram 166 664 2 one 4 1m 32 sdram one 4 1m 32 sdram 183 732 4 one 4 512k 32 sdram one 4 512k 32 sdram one 4 512k 32 sdram one 4 512k 32 sdram 166 664 48 3 one 4 1m 32 sdram one 4 1m 32 sdram one 4 1m 32 sdram 166 664 64 1 2 2. however mm_config.size is 32 mb (i.e. 7). two 4 4m 16 sdram 183 732 4 one 4 1m 32 sdram one 4 1m 32 sdram one 4 1m 32 sdram one 4 1m 32 sdram 166 664 table 12-5. supported 16-bit memory configurations size (mb) ranks rank configurations max. mhz peak mb/s 8 1 one 4 1m 16 sdram 183 366 16 1 1. however mm_config.size is set to 8 mb (i.e. 5) 1 one 4 2m 16 sdram 183 366 32 2 2. however mm_config.size is set to 8 mb (i.e. 5) 1 one 4 4m 16 sdram 183 366
pnx1300/01/02/11 data book philips semiconductors 12-4 preliminary specification 12.6.2 pll_ratios register the pll_ratios register controls the operation of the separate memory-interface and cpu plls. fields in this register determine if the pl ls are active and what in- put:output ratio each pll should generate. table 12-8 summarizes the field functions. figure 12-3 shows how the plls are connected and how fields in the pll_ratios register control them. for normal opera- table 12-6. memory configuration registers register purpose mm_config describes exter nal memory configuration pll_ratios controls separate memory and cpu plls (phase-locked loops) table 12-7. mm_config fields field function refresh refresh interval in memory clock cycles. default value 1000 (0x03e8). size memory rank size 0 reserved 1 512kb 21mb 32mb 4 4mb 58mb 6 16mb 7 32mb figure 12-2. memory interface configuration registers. 31 0 mm_config (r/o) 42 3 size pll_ratios (r/o) cr refresh 19 31 0 42 3 7 sdram pll bypass sdram pll disable cpu pll bypass cpu pll disable sdram ratio cpu ratio 5 6 sb sd cb cd sr 0x10 0100 mmio_base offset: 0x10 0300 16-bit memory interface bw table 12-8. pll_ratios fields field function cr cpu:memory ratio 0 1:1 12:1 23:2 34:3 45:4 5?7 reserved sr memory:external ratio 0 2:1 13:1 cd cpu pll disable 0 cpu pll on 1 cpu pll off cb cpu pll bypass 0 cpu pll 1 cpu memory sd sdram pll disable 0 sdram pll on 1 sdram pll off sb sdram pll bypass 0 memory pll 1 memory external figure 12-3. pnx1300 memory and core pll connections. memory system pll dspcpu pll cr 0 42 3 75 6 sd sb cd cb sr pll_ratios register pnx1300 core clock pnx1300 tri_clkin mm_clk1 mm_clk0 external clock input memory system clocks to ddses && evo pll x3, x9 pnx1300 peripheral clocks
philips semiconductors sdram memory system preliminary specification 12-5 tion both plls must be activated, i.e. {cd,cb,sd,sb} must be equal to 0000 (binary value). the operating limits of the internal plls are: ? 27 mhz < output of the sdram pll < 200 mhz ? 33 mhz < output of the cpu pll < 266 mhz these are not the speed grades of the chips, just the pll limits. cr (cpu-to-memory pll ratio). the 3-bit cr field se- lects one of five input-to-outp ut clock ratios for the cpu pll. the input clock is the memory system clock; the output clock determines the pnx1300 core operating fre- quency. the default value is ?0?, which implies a 1:1 cpu:memory ratio. see table 12-8 for other encoding. sr (memory-to-external pll ratio). the 1-bit sr field selects one of two memory-to- external clock ratios for the memory interface pll. the pll input is pnx1300?s external input clock tri_cl kin; the pll output deter- mines the operating frequency of the memory interface and sdram devices. the defaul t value is ?0?, which im- plies a 2:1 memory:external ratio. a value of ?1? implies a 3:1 ratio. cd (cpu pll disable). the 1-bit cd field determines whether or not the cpu pll is turned on. the reset value is ?1?, which disables operation of the cpu pll and dis- sipates almost no power. for normal operation the value should be zero, enabling the cpu pll. cb (cpu pll bypass). the 1-bit cb field determines whether the input or the output of the cpu pll drives pnx1300?s core logic. the default value is ?1?, which causes the pnx1300 core to be clocked by the input of the cpu pll (i.e., the memory interface clock). a value of ?0? causes normal operatio n, and the core is clocked by the output of the cpu pll. note that if both cb and sb are set to ?1? (bypass the cpu pll and the sdram pll), pnx1300?s core logic is effectively clocked at the external input frequency. note: it is illegal to use the output of a disabled pll. for example, it is illegal to have cd set to ?1? while cb is set to ?0?. sd (sdram pll disable). the 1-bit sd field deter- mines whether or not the sdram pll is turned on. the default value is ?1?, which disables the sdram pll. in this state, it dissipates almo st no power. for normal op- eration the value should be ?0?, enabling the sdram pll. sb (sdram pll bypass). the 1-bit sb field deter- mines whether the input or the output of the sdram pll drives the memory interfac e and memory devices. the default value is ?1?, which causes the memory system to be clocked by the input of the sdram pll (pnx1300?s external input clock). a value of ?0? causes normal oper- ation, and the memory system is clocked by the output of the sdram pll. 12.7 memory interface pin list the memory interface consists of 61 signal pins includ- ing clocks (but excluding power and ground pins). table 12-9 lists the interface signal pins. 12.8 address mapping the address mapping is determined by the state of the rank-size bits and the bus width bit in the mm_config register. 12.8.1 address mapping in 32-bit mode table 12-10 shows how internal address bits from the pnx1300 data highway bus are mapped to main-memo- ry address-bus and chip select pins (mm_a[13:0], mm_cs#[3:0]) in 32-bit data bus mode. the column ?rank addr./h.wa y bits? specifies which in- ternal data-highway address bits select the preliminary sdram rank. the actual rank used is subject to the lim- itation implied by the rela tionship between sdram aper- ture size (described in section 13.2.1 ) and the rank size. table 12-9. memory interface signal pins name function i/o active... mm_clk[1:0] memory bus clock o high mm_cs#[3..0] chip selects for the four memory ranks or address o low mm_ras# row-address strobe o low mm_cas# column address strobe o low mm_we# write enable o low mm_a[13:0] address o high mm_cke[1:0] clock enable o high mm_dqm[3:0] byte enables for dq bus o high mm_dq[31:0] bi-directional data bus i/o high table 12-10. 32-bit address mapping rank size rank addr. row address column address bank address h.way bits pins h.way bits pins h.way bits pin h.way bit 4 mb 23?22 10?0 21?11 7?0 10?6, 4?2 11 5 8 mb 24-23 12, 10?0 11, 22?12 12, 8?0 11, 11?6, 4?2 11 16 mb 25-24 13-12 10?0 12-11, 23?13 12, 9?0 11, 12?6, 4?2 11 32 mb ? cs#3 cs#2 13-12 10?0 25, 24, 12-11, 23?13 cs#3, cs#2, 12 9?0 25, 24, 11, 12?6, 4?2 11
pnx1300/01/02/11 data book philips semiconductors 12-6 preliminary specification the rank is selected via the chip select bits, mm_cs#[3:0]. the column ?row address/h. way bits? specifies which internal data-highway address bits map to the sdram row address. ?row address/pins? specifies which lines of pnx1300?s mm_a address bus serve as the sdram row address. for the 32 mb ranksize the chip selects may be used as row address. the column ?column address/h.way bits? specifies which data-highway address bits map to the sdram col- umn address. ?column addr ess/pins? specifies which lines of pnx1300?s mm_a address bus serve as the sdram column address. for the 32 mb ranksize the chip selects may be us ed as column address. mm_a[12] is only defined for a 8- or 16-mb rank size. mm_a[12] contains h.way bit 11 during the ras and cas operations. mm_a[12] ca n be used as a bank select (4-bank sdrams) or as a row address (two bank sdrams). mm_a[13] is only defined for a 16-mb rank size. mm_a[13] contains h.way bit 12 during the ras opera- tion. mm_a[13] can only be used as a row address. for the 32 mb ranksize the chip selects mm_cs#[3:2] pins are used as addresses. mm_cs#2 is used as a bank select in addition to mm_a[11] and mm_cs#3 is used as a row address. highway address bits 5?0 are the offset within a 64-byte block. all ?0? for an aligned block transfer. table 12-8 lists the mapping of bits 5?2 to identify in which sdram po- sitions the words of a block are located. bit 5 is always mapped to (one of) the sdram internal bank selects; thus, each sdram bank receives half (32 bytes) of the block transfer. highway address bits 4?2 are the word offset in a cache block. bits 1?0 are the byte offset within a 32-bit word. 12.8.2 address mapping in 16-bit mode table 12-11 shows how internal address bits from the pnx1300 data highway bus are mapped to main-memo- ry address-bus and chip select pins (mm_a[13:0], mm_cs#[3:2]) in 16-bit data bus mode. 12.9 memory inte rface and sdram initialization immediately after reset, the main-memory interface is ini- tialized by placing defaul t values in the mm_config and pll_ratios registers (see section 12.6, ?memory system programming? ). during the subsequent hard- ware boot process, when pnx1300 reads initial values from an external rom, these registers can be set to dif- ferent values. after pnx1300 is released from the reset state, the memory interface automatically executes 10 refresh op- erations, then initializes the mode register in each sdram chip. table 12-12 shows the settings in the sdram mode register(s). 12.10 on-chip sdram interleaving the main-memory interface (mmi) takes advantage of the on-chip interleaving of sdram devices. interleaving allows the precharge, ras, and cas commands needed to access one internal bank to be performed while useful data transfer is occurring with the other internal bank. thus, the overhead of preparing one bank is hidden dur- ing data movement to or from the other. the benefit of on-chip interleaving is sustainable full- bandwidth data transfer (1 word per clock cycle). the transition from one internal bank to the other happens on 8-word boundaries; transferring 8 words gives the inac- tive bank time to prepare (perform precharge, ras, and cas) so that when the last word of the 8-word block in the active bank has been transferred, the next word from the just-precharged bank is ready on the next cycle. the seamless transitions between the two on-chip banks can be sustained for a stre am of contiguous addresses with the same direction (read or write). that is, a stream of contiguous reads or contiguous writes can sustain full bandwidth. if a write follows a read, then a small gap be- tween transfers is needed. each bank access is termin ated with a read or write with automatic precharge, making a separate precharge com- mand before the next ras unnecessary. for 4 banks sdram devices, the signals used as bank addresses are interchangeable (i.e. it does not matter which of the two signals is connected to bank 1 or bank 0 of the sdram device). 12.11 refresh the mmi performs sdram re fresh cycles autonomously using the cas-before-ras (cbr) mechanism. sdrams have a 4k refresh interval: either 4096 rows must be re- table 12-11. 16-bit address mapping rank size rank addr. row address column address bank address h.way bits pins h.way bits pins h.way bits pins h.way bit 2 mb ? 9?0 20?11,5 7?0 10?6, 3?1 11 4 8 mb ? cs#3, cs#2, 13?12, 10?0 24, 23, 12?11, 22?13,5 cs#3, cs#2, 12, 8?0 24, 23, 11, 11?6,3?1 11 4 table 12-12. sdram mode register settings parameter value burst length 4 wrap type interleaved cas latency 3
philips semiconductors sdram memory system preliminary specification 12-7 freshed every 64 ms or 2048 rows every 32 ms or one row every 15.62 sec. new sdram devices (i.e. 256 mbit generation support an 8k refresh interval, therefore one row every 7.81 sec. the mmi performs refresh at timed intervals: one cbr refresh command must be issued every 15.6 s or every 7.81 sec. a counter in the mmi keeps track of the num- ber of sdram clock cycles between refresh operations. this counter starts after t he cbr operation has complet- ed; this cbr operation ta ke 19 cycles. when the counter reaches a programmed limit, the next refresh operation is due, and the next-in-line da ta transfer request from the data-highway is delayed until the cbr operation is exe- cuted. all devices in the main-memory system are refreshed si- multaneously. the refresh field in the mm_config register determines the number of memory-system clock cycles (as distinguished fr om pnx1300 core clock cy- cles) between the cbr refresh operations. each cbr refresh operation takes 19 sdram clock cy- cles. thus, at 100-mhz, refr esh consumes about 1.2% of maximum available sdram bandwidth (19 cycles out of 1560). the bandwidth impact is slightly lower at higher frequencies. table 12-13 lists the number of memory-system clocks for typical sdram operation speeds with a 15.62 s re- fresh period. this number includes the worst case sce- nario in order to guaranty the 15.62 s refresh period. table 12-14 lists the number of memory-system clocks for typical sdram operation speeds with a 7.81 s re- fresh period.this number includes the worst case sce- nario in order to guaranty the 7.81 s refresh period. 12.12 power-down mode when pnx1300 is put into power-down mode to reduce power consumption, the mmi responds by putting the sdram devices into their power-down mode. in this mode, the sdram devices reta in their contents through self-refresh. 12.13 output driver capacity pnx1300?s output driver circuits for the memory address and control signals (output signals in table 12-9 ), can drive up to two memory devices when the memory inter- face is operating at 183 mhz. if more devices are con- nected, then a lower sdram clock frequency must be chosen. table 12-15 lists the clock frequency as a function of the number of memory devices connected to unbuffered memory interface signals. two identical outputs are provided for both the mm_cke (clock-enable) and mm_cl k signals. each mm_cke and mm_clk signal is capable of driving one sdram devices at 183 mhz. 12.14 signal propagation delay compensation the pnx1300 mmi no longer has the two special pins, mm_matchout and mm_matchi n, that were used in the tm-1100 and tm-1000. this loop helped the inter- face compensate for the propagation delay through cir- cuit-board traces to and from the external sdram devic- es. it is now integrated into the mmi. read timing is internally derived. to avoid excessive ringing of the clock signals, series termination with a 33-ohm resist or is advised at the clock outputs. the delay of the memory clock with respect to the inter- nal sending and receiving clocks is adjusted inside the memory interface to achieve reliable communication and guarantee correct setup and hold times. figure 12-4 shows a conceptual circuit board layout. two sdram devices share a single clock output. the clock signals should have source-series termination. 12.15 circuit board design pnx1300 and its memory array form a high-speed digital system. even though only a small number of chips is in- volved, this digital system operates at frequencies high enough to make the analog characteristics of the con- nections between the chips significant. consequently, the system designer must take care to ensure reliable operation. 12.15.1 general guidelines ? in general, pnx1300 and its memory chips must be as close together as possibl e to minimize parasitic table 12-13. refresh value for a 15.62 s period sdram operation speed value for refresh field (decimal, hexadecimal) 100 mhz 1523, 05f3 125 mhz 1914, 0779 133 mhz 2038, 07f6 143 mhz 2195, 0892 166 mhz 2554, 09f9 183 mhz 2819, 0b03 table 12-14. refresh value for a 7.81 s period sdram operation speed value for refresh field (decimal, hexadecimal) 100 mhz 742, 02e6 125 mhz 936, 03a9 133 mhz 992, 03e7 143 mhz 1072, 0435 166 mhz 1256, 04e9 183 mhz 1384, 05e6
pnx1300/01/02/11 data book philips semiconductors 12-8 preliminary specification capacitance. close proxim ity is especially important for a 183-mhz memory system. ? signal traces between pnx1300 and the memory chips must be matched in length as closely as possi- ble to minimize signal skew. ? the clock-signal trace(s) must be as short as possi- ble. ? address and control-signal traces should also be short, but their length is less critical than the clock?s. ? data-signal traces should also be short, but their length is less critical th an the clock?s, especially if only one or two ranks are connected. ? connections to several loads must follow a ?t? con- nection scheme in order to limit the reflections. 12.15.2 specific guidelines ? the maximum length for a signal trace should be 10cm. for 183-mhz operation, signal trace length must not be longer than 7cm. ? the maximum capacitive load is 30 pf per trace, including loads. ? the signal traces on the pnx1300 circuit board must be designed as 50-ohm transmission lines. ? at most one sdram device may be connected to each mm_clk signal at 183 mhz. 12.15.3 termination no termination is required for address, data, and control signals. address and control signals are driven only by pnx1300; the output impedance of the drivers is suffi- ciently matched to prevent excessive ringing. pnx1300 design assumes that when driving data lines, the output drivers of sdram chips are also sufficiently impedance matched. series termination of the clock outputs with a 33-ohm re- sistor is advised. 12.16 timing budget the glueless interface of the pnx1300 main-memory in- terface makes the memory system simple and straight- forward from one point of view, but to ensure reliable op- eration at high clock rates, system designers must follow the board design guidelines (see section 12.15, ?circuit board design? ). sdram devices must meet the critical specifications list- ed in table 12-16 to ensure reliable operation of an 143- mhz (t cycle = 7 ns) memory system. for a 166 mhz operation, sdram devices must meet the critical specifications listed in table 12-17 to ensure table 12-15. glueless interface limits for address/ clocks memory chips maximum clock frequency 2 183 mhz 4 166 mhz 8 133 mhz figure 12-4. conceptual board layout. address & control clk dq[31:0] 33 ? address & control clk dq[31:0] sdram device sdram device pnx1300 memory interface address, clock enables, ras#, cas#, we# clock data[31:0] data highway pnx1300 on-chip peripherals dspcpu table 12-16. critical 143-mhz sdram parameters timing parameter value max. output delay t ac 6.4 ns min. output hold time t oh 2.0 ns max. input setup time t is 2.0 ns max. input hold time t ih 1.0 ns
philips semiconductors sdram memory system preliminary specification 12-9 reliable operation of an 166- mhz (t cycle = 6 ns) memory system. for a 183 mhz operation, sdram devices must meet the critical specif ications listed in table 12-18 to ensure reliable operation of an 183- mhz (t cycle = 5.4 ns) mem- ory system. these values leave virtually no margin for the critical tim- ing parameters in a high-speed system and assume a to- tal worst case delay from 0.6 ns to 0.4 ns (from 143 mhz to 183 mhz operating frequency the trace layout must be improved to reduce trace delay as well as skew) and a t su for pnx1300 of 0 ns. the maximum operating frequency is usually computed with the following equation: . where t cs is the skew between mm_clk0 and mm_clk1, and t su the input data setup time as defined in section 1.9.7.10 on page 1-19 , and t board includes trace delay and trace skew. 12.16.1 main ac parameter requirements the pnx1300 sdram interface was designed to sup- port a wide range of sdram vendors. table 12-19 , de- scribes some of the minimum sdram ac requirements for pnx1300 to operate correctly. the symbols or names are not really standardized and may differ from one ven- dor to another one. the table is not meant to be exhaus- tive and shows only the main parameters. parameters are expressed in clock cycles rather than ns. 12.17 example block diagrams the following figures illustrate some of the memory con- figurations that can be built with pnx1300. for all them the signals used as bank addresses, are interchange- able (i.e. it does not matter which of the two signals is connected to bank 1 or bank 0 of the sdram device). 12.17.1 block diagrams for a 32-bit interface the following sections present examples of possible connections with 16-, 64-, 128- and 256 mbit sdrams. mm_config.bw must be set to ?0? (refer to bw, section 12.6.1 ). 12.17.1.1 16-mbit devices or less these devices allow small memory configurations to be built. they are described in more details in the tm-1000 and tm-1100 databooks. table 12-17. critical 166-mhz sdram parameters timing parameter value max. output delay t ac 5.5 ns min. output hold time t oh 2.0 ns max. input setup time t is 1.5 ns max. input hold time t ih 1.0 ns table 12-18. critical 183-mhz sdram parameters timing parameter value max. output delay t ac 5.0 ns min. output hold time t oh 2.0 ns max. input setup time t is 1.5 ns max. input hold time t ih 1.0 ns t cycle t ac t board t cs t su +++ table 12-19. minimu m ac parameters description symbol clocks active command period t rc 10 active to precharge command t ras 7 precharge command period t rp 3 active bank a to active bank b t rrd 3 active to read or write command t rcd 3 write recovery time t wr 2
pnx1300/01/02/11 data book philips semiconductors 12-10 preliminary specification 12.17.1.2 64-mbit devices 64-mbit sdrams organized in x32 can be used to build an 8-, 16-, 24-, or 32-mb memory system. figure 12-5 shows an 8-mb memory system (one device only) and figure 12-6 details an extension of the block diagram in order to build a 16-mb configuration. dq[31:0] dqm[3:0] clk address[10:0] control cs# 4 512k 32 sdram mm_cs#[0] mm_clk[0] ba[1:0] figure 12-5. schematic of a 8-mb memory system consisting of one 4 512k 32 sdram (one rank). pnx1300 mm_cs#[0] mm_ras, cas, we#, cke mm_a[10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] 33 ? mm_a[12,11] dq[31:0] clk address[10:0] control dqm[3:0] cs# 4 512k 32 sdram mm_cs#[0] mm_clk[0] mm_dqm[3:0] mm_dq[31:0] dq[31:0] clk control dqm[3:0] cs# mm_dqm[3:0] mm_dq[31:0] mm_cs#[1] mm_clk[0] 33 ? 4 512k 32 sdram ba[1:0] ba[1:0] address[10:0] mm_cs#[1:0] mm_ras#, cas#, we#, cke mm_a[10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_a[12,11] figure 12-6. schematic of a 16-mb memory system consisting of two ranks of 4 512k 32 sdram chips. pnx1300
philips semiconductors sdram memory system preliminary speci fication 12-11 64-mbit sdrams organized in x16 can be used to build a 16-, 32-, 48- or 64-mb memory systems. figure 12-7 details a 32-mb memory system. removing the device controlled by mm_cs#[1] makes a 16-mb system. figure 12-7. schematic of a 32-mb memory system consisting of four 4 1m 16 sdram chips (two ranks) mm_cs#[1:0] mm_a[13,10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_cs#[1] mm_clk[1] mm_dq[31:16] mm_dq[15:0] mm_cs#[1] mm_clk[0] mm_dq[31:16] mm_dq[15:0] mm_cs#[0] mm_cs#[0] mm_clk[1] mm_clk[0] 33 ? mm_dqm[1:0] mm_dqm[3:2] mm_dqm[3:2] mm_dqm[1:0] pnx1300 mm_ras, cas, we#, cke dq[15:0] clk control dqm[1:0] cs# 4 1m 16 sdram ba[1:0] address[11:0] mm_a[12,11] dq[15:0] clk control dqm[1:0] cs# 4 1m 16 sdram ba[1:0] address[11:0] dq[15:0] clk control dqm[1:0] cs# 4 1m 16 sdram ba[1:0] address[11:0] dq[15:0] clk control dqm[1:0] cs# 4 1m 16 sdram ba[1:0] address[11:0]
pnx1300/01/02/11 data book philips semiconductors 12-12 preliminary specification 64-mbit sdrams organized in x8 devices could be used to build a 32-mb memory system as illustrated in figure 12-8 . note that due to the unusual way of using the devices, it is the only su pported configuration with x8 devices. mm_config.size must be set to 6 (i.e. 16-mb rank size, section 12.6.1 ). figure 12-8. schematic of a 32-mb memory system consisting of four 4 2m 8 sdram chips (one rank) mm_a[13,10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_clk[1] mm_dq[31:24] mm_dq[23:16] mm_clk[1] mm_dq[15:8] mm_dq[7:0] mm_clk[0] mm_clk[0] 33 ? mm_dqm[2] mm_dqm[3] mm_dqm[1] mm_dqm[0] pnx1300 mm_ras, cas, we#, cke dq[7:0] clk control dqm] 4 2m 8 sdram ba[1:0] address[11:0] mm_a[11] mm_cs#[1] dq[7:0] clk control dqm 4 2m 8 sdram ba[1:0] address[11:0] dq[7:0] clk control dqm] 4 2m 8 sdram ba[1:0] address[11:0] dq[7:0] clk control dqm] 4 2m 8 sdram ba[1:0] address[11:0] cs# gnd cs# gnd cs# gnd cs# gnd
philips semiconductors sdram memory system preliminary speci fication 12-13 12.17.1.3 128-mbit devices 128-mbit sdrams organized in x16 are partially sup- ported. the support is provided for a 32-mb memory sys- tem. it can only contain one ra nk (i.e. it cannot be extend- ed using the other mm_cs# pins). there are two possible connection schemes. figure 12-9 is backward comp atible with tm-1300. mm_config.size must be set to 6 (i.e. 16 mb rank size, section 12.6.1 ). figure 12-9. schematic of a 32-mb memory system consisting of two 4 2m 16 sdram chips (one rank) mm_a[13,10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_clk[0] mm_dq[31:16] mm_dq[15:0] mm_clk[1] 33 ? mm_dqm[1:0] mm_dqm[3:2] pnx1300 mm_ras, cas, we#, cke dq[15:0] clk control dqm[1:0] 4 2m 16 sdram ba[1:0] address[11:0] mm_a[11] mm_cs#[1] dq[15:0] clk control dqm[1:0] 4 2m 16 sdram ba[1:0] address[11:0] cs# gnd cs# gnd
pnx1300/01/02/11 data book philips semiconductors 12-14 preliminary specification figure 12-10 is not backward compat ible with tm-1300. mm_config.size must be set to 7 (i.e. 32 mb rank size, section 12.6.1 ). this new scheme has the advan- tage of being compatible with the figure 12-12 . this al- lows to build a system that receives 32- or 64-mb mem- ory system with the exact same footprint. figure 12-10. schematic of a 32-mb memory system consisting of two 4 2m 16 sdram chips (one rank) mm_a[13,10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_clk[0] mm_dq[31:16] mm_dq[15:0] mm_clk[1] 33 ? mm_dqm[1:0] mm_dqm[3:2] pnx1300 mm_ras, cas, we#, cke dq[15:0] clk control dqm[1:0] 4 2m 16 sdram ba[1:0] address[11:0] mm_a[11] mm_cs#[2] dq[15:0] clk control dqm[1:0] 4 2m 16 sdram ba[1:0] address[11:0] cs# gnd cs# gnd
philips semiconductors sdram memory system preliminary speci fication 12-15 128-mbit sdrams organized in x32 can be used to build 16-, 32-, 48- or 64-mb me mory systems. a 32-mb sys- tem is pictured in figure 12-11 . a 16-mb system can be obtained by removing the device controlled by mm_cs#[1]. similarly it can be extended to 48- or 64-mb by adding devices controlled by mm_cs#[3:2]. dq[31:0] clk address[11:0] control dqm[3:0] cs# 4 1m 32 sdram mm_cs#[1:0] mm_ras#, cas#, we#, cke mm_a[13,10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_cs#[0] mm_clk[1] mm_dqm[3:0] mm_dq[31:0] dq[31:0] clk control dqm[3:0] cs# mm_dqm[3:0] mm_dq[31:0] mm_cs#[1] mm_clk[0] 33 ? mm_a[12,11] 4 1m 32 sdram ba[1:0] figure 12-11. schematic of a 32-mb memory system consisting of two ranks of 4 1m 32 sdram chips. ba[1:0] address[11:0] pnx1300
pnx1300/01/02/11 data book philips semiconductors 12-16 preliminary specification 12.17.1.4 256-mbit devices 256-mbit sdrams organized in x16 can be used to build a 64-mb memory systems. figure 12-12 details a 64-mb memory system. mm_config.size must be set to 7 (i.e. 32-mb rank size, section 12.6.1 ). note the connections described in figure 12-12 for the 256-mbit sdrams organized in x16 can also be used to connect the 128-mbit sdram devices organized in x16 allowing the same footprint on the board for two different memory size configurations (i.e. 64 mb or 32 mb). refer to figure 12-10 for detailed connection of the 32-mb case. figure 12-12. schematic of a 64-mb memory system consisting of two 4 4m 16 sdram chips (one rank) mm_cs#3, mm_a[13,10:0] mm_clk[1:0] mm_dq[31:0] mm_dqm[3:0] mm_clk[0] mm_dq[31:16] mm_dq[15:0] mm_clk[1] 33 ? mm_dqm[1:0] mm_dqm[3:2] pnx1300 mm_ras, cas, we#, cke dq[15:0] clk control dqm[1:0] 4 4m 16 sdram ba[1:0] address[12:0] mm_a[11], mm_cs#2 dq[15:0] clk control dqm[1:0] 4 4m 16 sdram ba[1:0] address[12:0] cs# gnd cs# gnd
philips semiconductors sdram memory system preliminary speci fication 12-17 12.17.2 block diagrams for a 16-bit interface the following figures (i.e. figure 12-13 , figure 12-14 and figure 12-15 ) detail the sdram connections for the 64-, 128- and 256-mbit sdra ms organized in x16. they respectively build a memory system of 8-, 16- or 32-mb. mm_config.size must be set to 5 (i.e. 8-mb rank size, section 12.6.1 ) for all of the pictured configurations. mm_config.bw must be set to ?1? (refer to bw, section 12.6.1 ). note the connections described in figure 12-15 for the 256-mbit sdram device organized in x16 can also be used to connect a 128-mbit sdram device organized in x16, figure 12-14 , allowing the same footprint on the board for two different memory size configurations (i.e. 32 mb or 16 mb). figure 12-13. schematic of a 8-mb memory system consisting of one 4 1m 16 sdram chips (one rank) mm_clk[0] mm_dq[15:0] mm_dqm[1:0] dq[15:0] clk control dqm[1:0] 4 1m 16 sdram ba[1:0] address[11:0] cs# gnd mm_a[13,10:0] mm_clk[0] mm_dq[31:0] mm_dqm[3:0] 33 ? pnx1300 mm_ras, cas, we#, cke mm_a[12,11] figure 12-14. schematic of a 16-mb memory system consisting of one 4 2m 16 sdram chips (one rank) mm_a[13,10:0] mm_clk[0] mm_dq[31:0] mm_dqm[3:0] mm_clk[0] mm_dq[15:0] 33 ? mm_dqm[1:0] pnx1300 mm_ras, cas, we#, cke mm_a[11], mm_cs#2 dq[15:0] clk control dqm[1:0] 4 2m 16 sdram ba[1:0] address[11:0] cs# gnd
pnx1300/01/02/11 data book philips semiconductors 12-18 preliminary specification figure 12-15. schematic of a 32-mb memory system consisting of one 4 4m 16 sdram chips (one rank) mm_cs#3,mm_ a[13,10:0] mm_clk[0] mm_dq[31:0] mm_dqm[3:0] mm_clk[0] mm_dq[15:0] 33 ? mm_dqm[1:0] pnx1300 mm_ras, cas, we#, cke mm_a[11], mm_cs#2 dq[15:0] clk control dqm[1:0] 4 4m 16 sdram ba[1:0] address[12:0] cs# gnd
preliminary specification 13-1 system boot chapter 13 by gert slavenburg, bob bradfield, and hani salloum 13.1 boot sequence overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. before a pnx1300 system ca n begin operating, the main-memory interface (mmi) registers and on-chip clock ratio register must be configured. since the dspcpu cannot begin operating until after these regis- ters and circuits are initia lized, the dspcpu cannot be relied on to initialize these resources. consequently, pnx1300 needs an independent b ootstrap facility for low-level initialization. pnx1300 implements low-leve l system initialization by combining a small block of on-chip system boot logic with a single external serial boot eeprom connected to the i 2 c interface. see figure 13-1 . serial eeproms with an i 2 c interface are slow but ha ve the advantages of being space-efficient and inexpensive. the amount of informa- tion needed for initial system boot is small, so speed is not a concern. the pnx1300 system boot bloc k performs differently for each of two major types of pnx1300 system, distin- guished by host-assisted and autonomous bootstrap- ping. the most significant bit of the tenth byte in the ex- ternal eeprom determines th e system boot procedure and must match the system configuration. in host-assisted bootstrapping , a pnx1300 device is in- tegrated into a system where some other processor serves as the host. for example, a pnx1300 chip might be part of a pci card in a standard personal computer (pc). in this case, the pnx1300 system boot only needs to load enough information from the serial eeprom to configure the on-chip timing circuits and mmi; the host processor can perform all other pnx1300 setup chores. in the second type of system, autonomous bootstrapping takes place. in this conf iguration, a pnx1300 device serves as the host (main) processor; consequently, the pnx1300 system boot must perform more work. in addi- tion to configuring on-chip timing and the mmi, the sys- tem boot must set the base addresses of the main mem- ory and mmio address apertures and load into main memory a level 1 bootstrap program for the dspcpu. only the first 10 bytes of the serial eeprom are needed when pnx1300 is not the host pci processor; thus, such systems can use a very low-cost 128-byte eeprom de- vice. when pnx1300 serves as the system?s host pro- cessor, the boot logic permits almost 2 kb of storage for the level 1 bootstrap dspcpu program in a single eight- pin eeprom device. figure 13-1. the system boot logic uses the i2c in- terface to access a serial eeprom that contains main-memor y and s y stem timin g information. 4.7k ? pnx1300 system boot block i 2 c interface serial eeprom scl sda 4.7k ? v dd table 13-1. system boot features characteristic comments boot configurations supported ? host assisted, e.g., pnx1300 is a pci slave in a standard pc. ? autonomous, e.g., pnx1300 is the host pci processor. rom device types supported ? single standard i 2 c serial eeproms from 128 bytes to 2kb in size. ? eeproms connect via the pnx1300 built-in 2-wire i 2 c inter- face. ? the use of eeproms with hard- ware write protect (wp) is recom- mended. a jumper on wp allows user control over in-system repro- gramming using the i 2 c interface. ? the eeprom must respond to i 2 c device address 1010. rom device examples ? atmel 24c01a (128 bytes, wp) ? atmel 24c08 (1kb, wp) ? atmel 24c16 (2kb, wp). rom size ? from 128 bytes to 2 kb (one device) for initial program load.
pnx1300/01/02/11 data book philips semiconductors 13-2 preliminary specification 13.2 boot hardware operation the pnx1300 boot sequence begins with the assertion of the reset signal tri_reset# . after reset is de-assert- ed, only the system boot block, i 2 c, and pci interfaces are allowed to operate. in particular, the dspcpu and the internal data highway bus will remain in the reset state until they are explicitly released during the boot pro- cedure. in autonomous boot, the system boot block is re- sponsible for releasing t he dspcpu and highway from reset. in host-assisted boot, the boot logic releases the highway from reset and the pnx1300 software driver (which runs on the host processor) releases the dspcpu from reset. the system boot block operation is illustrated in a flow chart shown in figure 13-2 . 13.2.1 boot procedure common to both autonomous and host-assisted bootstrap there should be no other i 2 c master active from reset until boot eeprom load co mpletes. the system boot procedure begins by loading a few critical pieces of infor- mation from the serial eeprom . this part of the proce- dure is common to both autonomous and host-assisted bootstrapping. see table 13-2 for a summary and table 13-5 for full bit-accurate eeprom layout details. the first byte of the eeprom is read using a serial clock equal to boot_clk/1000, which is guaranteed to be less than 100 khz. after reading the first byte, which con- tains the actual boot_clk rate as well as the eeprom speed capability, the boot bl ock proceeds to read subse- quent bytes at the highest valid speed. the number of lines in the eeprom device should be ?0? in case of a 128-byte device and ?1? for larger devices. the sdram aperture size should be set to the smallest size that is larger than or equal to the actual size of sdram connected to pnx1300. the sdram aperture size information is forwarded to the pci interface for use in host bios configuration, as described in section 13.3.2, ?stage 2: host-system pci configuration.? the boot_clk speed bits should be set to match the closest rounded up frequency of the external clock cir- cuit, i.e. for an external clock of 40 mhz or 50 mhz the value should be 10. this field, together with the ee- prom maximum clock speed bit are used to decide the best possible divider ratio for generation of the i 2 c clock, as shown in table 13-3 . in addition, the delay actions in figure 13-2 are taken based on the specified boot_clk value. the eeprom maximum clock sp eed bit is set to match the speed grade of th e serial eeprom device. the test mode bit should always be set to ?0?. it is only set to one for factory ate testing. the subsystem id and subsystem vendor id data has no meaning to the pnx1300 hardware; its meaning is entirely software defined. the value is loaded by the sys- tem boot block from the eeprom and published in the pci configuration space regist er at offset 0x2c to pro- vide the 16-bit subsystem id and subsystem vendor id values. these values are used by driver software to dis- tinguish the board vendor and product revision informa- tion for multiple board products based on the pnx1300 chip. refer to section 11.5.12, ?subsystem id, sub- table 13-2. information loaded during first part of bootstrapping procedure information size interpretation number of lines in eeprom device 1 bit 0 128 lines 1 256 or more lines sdram aperture size 3 bits 000 1 mb 001 1 mb 010 2 mb 011 4 mb 100 8 mb 101 16 mb 110 32 mb 111 64 mb boot_clk speed 2 bits 00 100 mhz 01 75 mhz 10 50 mhz 11 33 mhz i 2 c clock speed 1 bit 0 100 khz 1 400 khz test mode 1 bit 0 normal operation 1 rapid ate testing subsystem id 16 bits value is copied to sub- system id register in pci configuration space. subsystem vendor id 16 bits value is copied to sub- system vendor id regis- ter in pci config space. mm_config register initialization 20 bits value is simply written to the mm_config regis- ter; see section 12.6.1, ?mm_config register.? pll_ratios register initialization 8 bits value is simply written to the pll_ratios regis- ter; see section 12.6.2, ?pll_ratios register.? autonomous/host- assisted boot 1 bit 0 host-assisted 1 autonomous enable internal pci_clk 1 bit 0 pci_clk taken from outside 1 use on-chip xio pci_clk clock generator note: must be set if no external pci clock is supplied sdram prefetchable 1 bit 0 not prefetchable 1 prefetchable
philips semiconductors system boot preliminary specification 13-3 system vendor id register,? for more information on the choice of values. the mm_config and pll_ratios registers control the hardware of the mmi and pnx1300 on-chip clock cir- cuits. these registers are described in detail in section 12.6, ?memory system programming.? the boot value should be set to reflect the ex act capabilities of the actual sdram in the system. the ?enable internal pci_clk generator? bit determines the pci_clk pin operating mode. if this bit is ?0?, pci_clk acts compatible with tm-1000 and normal pci operation, i.e. it is an input pin that takes pci clock from the external world. if this bit is ?1?, an on-chip clock divider in the xio logic becomes the source of pci_clk, and the pci_clk pin is configured as an output. in the latter case, the pci_clk frequency can be programmed to a divider of the pnx1300 highway clock by setting the xio_ctl register ?clock frequency? divider value. refer to chapter 22, ?pci-xio external i/o bus.? note: this bit must be set if no external pci clock is supplied. the ?sdram prefetchable? bit is copied to the pci con- figuration space register dram_base and only visible as bit #3 (p bit) of dram_base in a pci configuration read, but not visible by mmi o access. its purpose is to tell the pci host, that sdram reads will cause no side ef- fects. the host may apply op timizations on pci access, if this bit is set. the ?autonomous/host-assisted boot? bit determines whether the system boot lo gic will continue reading more information from the eeprom or halt its operation so the host can complete system in itialization. after the infor- mation listed in table 13-2 has been loaded into pnx1300 registers, an external pci host processor can finish the initialization of pnx1300. if no external pci host processor is present, the autonomous/host-assisted boot bit should be set to ?1? to allow the system boot logic to load the information described in the next section. table 13-3i 2 c speed as a function of eeprom byte 0 boot_clk bits eeprom speed bit divider value actual i2c speed 00 (100 mhz) 0 (100 khz) 1008 99.2 khz 00 1 (400 khz) 256 390.6 khz 01 (75 mhz) 0 (100 khz) 752 99.7 khz 01 1 (400 khz) 192 390.6 khz 10 (50 mhz) 0 (100 khz) 512 97.6 khz 10 1 (400 khz) 128 390.6 khz 11 (33 mhz) 0 (100 khz) 336 98.2 khz 11 1 (400 khz) 96 343.8 khz
pnx1300/01/02/11 data book philips semiconductors 13-4 preliminary specification tri_reset# de-asserted 8-bit serial read: 1 bit: eprom capacity 3 bits: dram aperture size 2 bits: pnx1300 clock speed 1 bit: i 2 c clock rate 1 bit: test mode control write to eeprom size register write aperture size to dram_round_size size register in pci biu write to pnx1300 clock speed register 32-bit serial read write to subsystem id registers in pci biu write 20 bits to mm_config register in mmi write to pll_ratios register in mmi disable mmi_reset to activate highway autonomous boot yes no system boot halts (host driver will complete the boot procedure) save 11-bit byte count write to mmio space: mmio_base write to mmio space: dram_base write to mmio space: dram_cacheable_limit bytecount == 0 yes no write to sdram write 32 bits of code onto highway with all byte enables active. then execute 15 dummy writes on highway to meet mmi protocol. decrement byte count by four write to mmio space: disable cpu_reset. dspcpu starts execution at dram_base in big-endian mode. system boot halts 24-bit serial read 8-bit serial read 8-bit serial read 64-bit serial read 8-bit serial read 64-bit serial read 64-bit serial read 32-bit serial read 32-bit serial read wait 400 usec for plls to lock wait ca. 0.6 msec for i 2 c to stabilize figure 13-2. flow chart of system boot procedure fo r both host-assisted and autonomous configurations.
philips semiconductors system boot preliminary specification 13-5 13.2.2 initial dspcpu program load for autonomous bootstrap in a system where pnx1300 serves as the host cpu, the system boot block performs an autonomous boot proce- dure. for an autonomous boot, the system boot block reads all the information described in section 13.2.1, ?boot procedure common to both autonomous and host-assisted bootstrap,? and then?because the au- tonomous boot bit is set?continues reading information from the eeprom. afte r this part of the system boot pro- cedure is done, the dspcpu starts executing. see table 13-4 . the dspcpu bootstrap program byte count encodes the number of bytes of dspcpu program code contained in the eeprom(s). this 11-bit unsigned byte count can en- code up to 2048 bytes, which is also the maximum amount of eeprom storage supported. the actual amount of eeprom available for the dspcpu boot- strap program is limited to 2000 bytes. other information consumes 47 bytes, and the dspcpu code must be an integral number of 32-bit words. four pairs of 32-bit mmio-register addresses and values follow the bootstrap program byte count. each address tells the boot block where in the 32-bit dspcpu address space to store the corresponding 32-bit value. the first pair initializ es the mmio_base. the mmio_base sets the base address of the 2-mb mmio- register address aperture wit hin the dspcpu 32-bit ad- dress space. all mmio registers are addressed using an offset that is relative to the value of mmio_base. for this pair, the address is required to be 0xeff00400 be- cause that is t he default mmio_base enforced when pnx1300 is reset. the new value for mmio_base is en- coded in the corresponding value. the dram_base address/val ue pair determine the base address of the sdram address aperture within the 32-bit dspcpu address space. the address must be equal to 0x100000 plus the new value of mmio_base set previously in the boot procedure. the dram_base value must be naturally aligned given the rounded dram aperture size, i.e. a 6 mb dram aperture should start on a 8 mb address multiple. the dram_limit address/value pair determine the ex- tent of the sdram address aperture. the address must be equal to 0x100004 plus the new value of mmio_base set previously in the boot procedure. the value in dram_limit should be 1 higher than the ad- dress of the last valid byte of sdram memory, and must be a 64 kb multiple. the dram_cacheable_limit address/value pair de- termine the extent of the cacheable aperture of the sdram address space. the address must be equal to 0x100008 plus the value of mmio_base set previously in the boot procedure. th e cacheable aperture always begins at the address valu e in dram_base; the value in dram_cacheable_limit is one higher than the address of the last byte of cacheable sdram memory, and must be a 64 kb multiple. it is safe to initially set the value of dram_cacheable_limit equal to dram_limit. the rtos can, if desired, change the val- ue later. the next 32-bit value in bo ot eeprom memory is a copy of the dram_base value enc oded previously. the sys- tem boot hardware loads the dspcpu bootstrap pro- gram into sdram starting at dram_base. the bytes of the dspcpu bootstrap program follow the copy of the sdram_base value. the bootstrap pro- gram can consist of up to 500 32-bit words of dspcpu table 13-4. information loaded during second part of bootstrapping procedure for autonomous boot information size interpretation dspcpu bootstrap pro- gram byte count n 11 bits up to 500 32-bit words (2048 bytes less 47 header bytes) mmio_base address 32 bits value must be 0xeff00400 mmio_base value 32 bits value is simply written to 0xeff00400 to determine new base address of 2-mb mmio register aperture within 32-bit dspcpu address space dram_base address 32 bits mmio_base + 0x100000 dram_base value 32-bits value is simply written to dram_base to determine base address of sdram aperture within 32-bit dspcpu address space dram_limit address 32-bits mmio_base + 0x100004 dram_limit value 32-bits value is simply written to dram_limit to deter- mine limit address of sdram aperture within 32-bit dspcpu address space dram_cacheable_ limit address 32-bits mmio_base + 0x100008 dram_cacheable_ limit value 32-bits value is si mply written to dram_cacheable_lim it to determine limit address of cacheable part of sdram aperture within 32-bit dspcpu address space dram_base value 32-bits copy of the dram_base; must be equal to value specified above sdram code word 0 32-bits firs t 32-bit word of initial dspcpu bootstrap pro- gram sdram code word 1 32-bits second 32-bit word of ini- tial dspcpu bootstrap program . . . . . . . . . sdram code word n /4 32 bits last 32-bit word of initial dspcpu bootstrap pro- gram
pnx1300/01/02/11 data book philips semiconductors 13-6 preliminary specification instructions. the byte count mu st be a multiple of four. note that the bytes are stored in the eeprom in a byte swapped order per group of 4 compared to sdram, as detailed in table 13-5 . after the entire dspcpu bootstrap program is loaded into sdram at dram_base, the system boot logic re- leases the dspcpu from the reset state. at this point, the dspcpu begins execut ing the bootstrap program starting at dram_base and pnx1300 is fully operation- al. at the same time, the boot logic releases the i 2 c inter- face. 13.3 host-assisted boot description for a host-assisted bootstr ap, the complete bootstrap process consists of three di stinct stages, but the system boot hardware performs only the first stage. the other two stages are the responsi bility of the host system. 13.3.1 stage 1: pnx1300 system boot hardware in the first stage, the pnx1300 hardware must be initial- ized enough to allow the host system to query and ma- nipulate pnx1300 resource s. the system boot hard- ware, using the procedure described above in section 13.2.1, ?boot procedure common to both autonomous and host-assisted bootstrap,? initializes the subsystem id, subsystem vendor id, mm_config, and pll_ratios registers, waits for the plls to lock, en- ables the internal highway and mmi, but leaves the dspcpu in the reset state. after this minimal initializa- tion, the host system can fi nish the bootstrap process. at the completion of stage 1, the pnx1300 hardware is ready to respond to pci configuration space accesses, and the boot block has released the i 2 c interface. 13.3.2 stage 2: host-system pci configuration stage 2 is carried out either by the host-system pci bios or by a combination of the bios and the host op- erating system (e.g., windows 95). during this stage, the host system configures all pci-bus clients. the pci-bus configuration consists of querying the bus clients to determine the following: ? the number of pci base-address registers imple- mented by each client. for pnx1300, the number of pci base-address registers is always two (mmio_base and dram_base). ? the size of each aperture associated with the base- address registers. for pnx1300, the size of the mmio aperture is always 2 mb. the size of the sdram aperture can range from 1 mb to 64 mb, and the size must be a power of two (seven distinct sizes). using this information, th e host system relocates each address aperture to eliminate overlaps in the pci ad- dress space. the host system accomplishes the reloca- tion by considering each aperture?s size and then writing an appropriate starting address to each base-address register. for pnx1300, the base addresses of the mmio and sdram apertures must be relocated in this way. note that in the case of autonomous boot, this relocation is done statically by the system boot hardware when it simply copies the values of mmio_base and dram_base from the serial eeprom into these regis- ters. the steps of the pci protocol for determining the size of an address aperture are as follows (see section 11.5.11, ?base address registers,? for a more complete discus- sion): ? the host writes a 32-bit wo rd of all ?1?s (0xffffffff) to the base-address register. ? the host reads the base-address register immedi- ately after the write. the value returned will have ?0?s in all don?t-care bits and ?1?s in all required address bits. the required address bits form a left-aligned (i.e., starting at the most -significant bit) contiguous field of ?1?s. ? this left-aligned field of ?1?s effectively specifies the size of the address aperture by indicating the bits of the base-address register that are significant for relo- cation. that is, an address aperture of size 2 n can only begin on a 2 n -byte-aligned boundary. as an example, consider the case of the mmio aperture. the host will perform the fo llowing steps during stage 2 of the bootstrap process: ? write 0xffffffff to mmio_base. ? read from mmio_base, wh ich returns the value 0xffe00000. the host sees that this value has an 11- bit left-aligned field of ?1?s, which indicates that the aperture can only be relocated on 2-mb boundaries; thus, the aperture size is 2 mb. ? write a new value to mmio_base with the top 11 bits set to relocate the mmio aperture to a 2-mb region of pci address space that does not conflict with other pci address apertures. at the completion of stage 2, the pnx1300 hardware is ready to respond to host co nfiguration space accesses, host mmio accesses and host sdram aperture access- es. the dspcpu is still in reset state. 13.3.3 stage 3: pnx1300 driver executing on the host during the final stage of t he bootstrap process, the pnx1300 software driver ex ecuting on the host system will write to sdram a progra m for the dspcpu, and ini- tialize any mmio registers. wh en the initial program load is complete, the driver releas es the dspcpu from its re- set state by a write to the biu_ctl register with the cr bit set. see chapter 11, ?pci interface.? now, with the dspcpu and host both running, the pnx1300 bootstrap process is complete.
philips semiconductors system boot preliminary specification 13-7 13.4 detailed eeprom contents table 13-5 shows the serial eepr om contents needed for an autonomous boot procedure. for the host-assisted boot procedure, only the contents up to line nine are needed. note that the 32-bit words in the serial eeprom are not stored on 32-bit word-aligned addresses. table 13-5. serial boot eeprom contents line data byte bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0 0 #lines 0: 128 lines 1: 256 or more lines sdram size[2:0] 000: 1mb 001: 1mb 010: 2mb 011: 4mb 100: 8mb 101: 16mb 110: 32mb 111: 64mb boot_clk[1:0] 00: 100 mhz 01: 75 mhz 10: 50 mhz 11: 33 mhz eeprom clock 0: 100 khz 1: 400 khz test mode 0: normal 1: rapid ate 1 2 3 4 subsystem id, 8 msb subsystem id, 8 lsb subsystem vendor id, 8 msb subsystem vendor id, 8 lsb 5 6 7 ? ? ? ? mm_config[19:16] mm_config[15:8] mm_config[7:0] 8 pll_ratios[7:0] sdram pll bypass sdram pll dis- able cpu pll bypass cpu pll disable sdram ratio cpu ratio[2:0] 9 boot type 0: host assist. 1: autonomous enable inter- nal pci_clk sdram prefetchable 0:no 1:yes ? ? byte count [10:8] 10 byte count [7:0] 11 12 13 14 mmio_base address [31:24] (must be 0xef) mmio_base address [23:16] (must be 0xf0) mmio_base address [15:8] (must be 0x04) mmio_base address [15:8] (must be 0x00) 15 16 17 18 mmio_base value [31:24] mmio_base value [23:16] mmio_base value [15:8] mmio_base value [7:0] 19 20 21 22 dram_base address [31:24] (must be byte 3 of mmio_base + 0x100000) dram_base address [23:16] (must be byte 2 of mmio_base + 0x100000) dram_base address [15:8] (must be byte 1 of mmio_base + 0x100000) dram_base address [7:0] (must be byte 0 of mmio_base + 0x100000) 23 24 25 26 dram_base value [31:24] dram_base value [23:16] dram_base value [15:8] dram_base value [7:0] 27 28 29 30 dram_limit address [31:24] (must be byte 3 of mmio_base + 0x100004) dram_limit address [23:16] (must be byte 2 of mmio_base + 0x100004) dram_limit address [15:8] (must be byte 1 of mmio_base + 0x100004) dram_limit address [7:0] (must be byte 0 of mmio_base + 0x100004) 31 32 33 34 dram_limit value [31:24] dram_limit value [23:16] dram_limit value [15:8] dram_limit value [7:0] 35 36 37 38 dram_cacheable_limit address [31:24] (must be byte 3 of mmio_base + 0x100008) dram_cacheable_limit address [23:16] (must be byte 2 of mmio_base + 0x100008) dram_cacheable_limit address [15:8] (must be byte 1 of mmio_base + 0x100008) dram_cacheable_limit address [7:0] (must be byte 0 of mmio_base + 0x100008)
pnx1300/01/02/11 data book philips semiconductors 13-8 preliminary specification 39 40 41 42 dram_cacheable_limit value [31:24] dram_cacheable_limit value [23:16] dram_cacheable_limit value [15:8] dram_cacheable_limit value [7:0] 43 44 45 46 repeat of dram_base value [31:24] repeat of dram_base value [23:16] repeat of dram_base value [15:8] repeat of dram_base value [7:0] 47 48 49 50 byte 0 of dspcpu bootstrap program (stored at dram_base + 3) byte 1 of dspcpu bootstrap program (stored at dram_base + 2) byte 2 of dspcpu bootstrap program (stored at dram_base + 1) byte 3 of dspcpu bootstrap program (stored at dram_base + 0) . . . . . . j+47 byte j of dspcpu bootstrap program (stored at dram_base + ((j div 4) + (3 ? (j mod 4)))) . . . . . . (n?1) +47 last byte of dspcpu bootstrap program (bits [7:0] of last 32-bit word, stored at dram_base + n ? 4) table 13-5. serial boot eeprom contents line data byte bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0
philips semiconductors system boot preliminary specification 13-9 13.5 eeprom access protocols figure 13-3 shows the sda (serial data) line protocols for three types of read accesses supported by i 2 c serial eeproms. a read from the address curr ently latched in- side the eeprom can be for either a single byte or for an arbitrary series of s equential bytes. the master makes the choice by setting the ack bit after a byte has been transferred. a random-access read is accomplished by performing a dummy write, which overwr ites the latched address stored inside the eeprom. once the internal address latch is set to the desired value, one of the other two read protocols can be used to read one or more bytes. the boot logic inside pnx1300 uses a single random read transaction to location 0 of device address 1010000 followed by a sequential read extension to read all re- quired eeprom bytes in a single pass. sda line protocol: random read s t a r t device address w r i t e w a 7 w a 6 w a 5 w a 4 w a 3 w a 2 w a 1 w a 0 1010 a 0 p 1 p 0 d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 s t a r t r e a d s t o p a c k a c k a c k 1010 d a 0 p a 0 p a 0 n o a c k device address dummy write 1010 a 0 p 0 p 0 s t a r t r e a d s t o p a c k a c k device address d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 n o a c k d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 a c k d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 a c k sda line protocol: sequential read data n data n+1 data n+2 data n+3 1010 a 0 p 0 p 0 s t a r t r e a d a c k n o a c k d 7 d 6 d 5 d 4 d 3 d 2 d 1 d 0 sda line protocol: current-address read data n device address s t o p figure 13-3. protocols supported by the boot block for reading the eeprom
pnx1300/01/02/11 data book philips semiconductors 13-10 preliminary specification
preliminary specification 14-1 image coprocessor chapter 14 14.1 image coprocessor overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the image coprocessor (icp) connects to the pnx1300 on-chip data highway to perform sdram block read and write actions. it also connects to the pci interface to al- low block write transactions across pci. the major functions of the icp are: ? filter an image by reading the image from sdram and writing the image back to sdram, while apply- ing a user-defined polyphase filter with optional hori- zontal up- or down-scaling. ? filter an image by reading the image from sdram and writing the image back to sdram, while apply- ing a user defined polyphase filter with optional verti- cal up- or down-scaling. ? filter an image and convert it from planar to rgb or yuv composite by reading the image from sdram and writing the image out to pci bus memory (graph- ics card) or sdram, while performing horizontal scaling and conversion to one of a several rgb or yuv formats. the programmer can add optional bit- map masking to selectively enable/disable pixel writes to pci (to refresh on ly the exposed part of a video window) and an optional image overlay with alpha blending and optional chroma keying (pci out- put only). ? move an image by reading the image from sdram and writing it back to sdram. all of the icp functions move and transform data from memory to memory or memory to the pci bus. hence, the dspcpu can use the icp in a time-sharing fashion to simultaneously achieve: 1. vertical and horizontal resizing/subsampling on the image stream from the video in (vi) unit. 2. vertical and horizontal resizing/upsampling on the im- age stream sent to the video out (vo) unit. 3. presentation of a collecti on of live video windows with programmable up and down scaling and arbitrary overlap configuration on pci graphics cards. 1 full 2d scaling and filtering requires two passes over the data: one for horizontal scaling and filtering and one for vertical scaling and filtering. figure 14-1 shows a block diagram of the pnx1300 with the icp. figure 14-2 shows a block diagram of the inter- nal structure of the icp. the icp contains a 5-tap filter, yuv to rgb converter, an overlay and alpha blending unit, and an output formatter. these blocks communicate with each other through fifos that also buffer the block data to and from the pnx1300 data highway. the icp uses a microprogram-controlled sequencer to control its internal timing. the program for this sequencer is in a ta- ble in sdram. the icp reads the appropriate portion from the sdram each time the icp is commanded to perform a function. microprogram control simplifies and minimizes the icp hardware and increases the flexibility of the icp to perform additional tasks without adding hardware. 14.2 requirements 14.2.1 functions the major functions of the icp include: 1. read an image from sdram and write the image back to sdram, while applying a user defined polyphase filter with optional up or down scaling in horizontal direction. 2. read an image from sdram and write the image back to sdram, while applying a user defined polyphase filter with optional up or down scaling in vertical direction. 3. read an image from sdram and write the image out to pci bus memory (graphics card) or sdram, while performing horizontal scaling and conversion to one of a several rgb and yuv formats. the pci output mode includes optional bitmap masking to selectively enable/disable pixel writes to pci (to refresh only the exposed part of a video window) and optional rgb overlay with alpha blending and optional chroma key- ing. 14.2.2 bandwidth icp bandwidth can be estimated from the worst-case im- age processing bandwidth. if the worst case image is 1024 x 768 at 30 hz in yuv 4:2:2 format, the pixel rate is 1024 x 768 x 30 = 23.59 mpix/sec. for yuv 4:2:2 image coding at 2 bytes per pixel, this is 23.59 x 2 = 47.19 mb/ 1. note that function 2 and 3 don?t normally occur simulta- neously, and if an application attempts both simulta- neously, some performance limitations are incurred.
pnx1300/01/02/11 data book philips semiconductors 14-2 preliminary specification sec. the minimum bandwidth for the icp function is therefore 47.18 mb/sec., or approximately 50 mb/sec. video dma in audio dma in audio dma out i$ d$ i 2 c interface image coprocessor pnx1300 memory controller pci master/sla ve interface vld video out digital dmsd or raw video serial digital audio jtag clock pci local bus sdram sdram highway ssi camera figure 14-1. pnx1300 chip block diagram dspcpu coprocessor fifo bank 5-tap filter microprogram control unit to p c i y u v overlay bit mask to sdram microcode overlay + alpha blending + chroma keying yuv => rgb conversion output formatting + bit masking image coprocessor overlay bit mask to sdram pnx1300 data highway figure 14-2. image coprocessor block diagram
philips semiconductors image coprocessor preliminary specification 14-3 scaling and filtering of the two dimensional image re- quires two passes of the image data through the filter, one for vertical and one for horizontal. scaling an image and sending it to the pci bus requires three transfers of the image over the sdram bus: one transfer to read the image for vertical f iltering, one transfer to write the fil- tered data back, and one transfer to read the image for horizontal filtering and output to the pci bus. this means an average of sdram bus bandwidth of 3 x 50 = 150 mb/sec for the 1024 x 768 image case described above, assuming a scaling factor of 1.0. a larger or smaller scal- ing factor means that either the input or output image will be smaller than 1024 x 768. the bandwidths required are determined by the larger of the two images, input or out- put. this is because all input pixels must be scanned to generate all the output pixels. 14.2.3 image size and scaling image sizes in the pnx1300 have a nominal range of 16 x 16 to 1024 x 768. sizes smaller than 16 x 16 are pos- sible, but are too small to be recognizable images. imag- es larger than 1024 x 768 (up to 64 k x 64 k) are possible but they cannot be processed in real time and require larger sdram sizes. scaling factors have a nominal range of 1/4 (down scaling by 4) to 4 (upscaling by 4). larger up and down scaling factors are possible, up to 1000 and beyond; however, very large upscaling factors result in a large magnification of a few pixels, and very large down scaling factors give only a few pixels as a re- sult. 14.3 interface the icp unit has no pnx1300 external pins. it interfaces internally to the data hi ghway and the pci interface. 14.4 data formats the icp unit accepts input and overlay image data to generate output image data. the icp accommodates a variety of formats for the input, overlay and output data. these image data formats define the relationship be- tween the y, u, and v or r, g, and b components of the image as they are stored in memory. the icp accepts in- put image data in planar format, where the y, u and v components are in separate tables in sdram. the vari- ous input image data formats differ in the position of the u and v components relative to the y component and the amount of u and v data relative to the y data. in all modes except the yuv to rgb conversion modes, each icp operation processes one y, u, or v image com- ponent. three separate commands are required to pro- cess all three components of an image. since each com- ponent is scaled and filtered separately, the software defines the image format a nd format conversion by how it scales each component. for pixel format conversion for pci or sdram output mode, each output pixel is a combination of rgb or yuv components as defined by the output format. the yuv input data and the rgb or yuv overlay data are com- bined by the icp hardware pixel by pixel to form the rgb or yuv output pixels. because all three yuv compo- nents are simultaneously woven together to create each output pixel, the icp hardware must know the image data format in sdram, defined as how the components of the image data are to be found and combined. in the yuv to rgb conversion mode, the icp accepts the following input data formats: yuv 4:2:2 co-sited, yuv 4:2:2 interspersed, and yu v 4:2:0. in this mode, the icp will also accept image overlay data when pci output is specified. the icp accept s image overlay data in sev- eral combined formats: rgb 24+ , rgb 15+ , and yuv 4:2:2+ . in this mode, the icp generates output data in several rgb and yuv formats. these formats are com- patible with a wide variety of pci frame buffers. 14.4.1 image input formats the icp image input formats define the relative positions of the y component and the u and v components of the input image pixel data. there are three input formats to the icp: 4:2:2 co-s ited, 4:2:2 interspersed, and 4:2:0 in- terspersed. the 4:2:2 formats have 2 u and 2 v pixels for every 4 y pixels, so the ratio of y to u or v is 2:1. the 4:2:0 format has 1 u and 1 v pixel for every 4 y pixels, so the ratio of y to u or v is 4:1. the input formats are given below. the input formats have a significant impact on the 2 dimensional scaling operation. 14.4.1.1 yuv 4:2:2 co-sited in the yuv 4:2:2 co-sited format, the u and v pixels co- incide with the y pixel on every other pixel, as shown in figure 14-3 . 14.4.1.2 yuv 4:2:2 interspersed in the yuv 4:2:2 interspers ed format, the u and v pixels lie between the y pixels on every other pixel of the hori- zontal line, as shown in figure 14-4 . 14.4.1.3 yuv 4:2:0 xy interspersed in the yuv 4:2:0 interspers ed format, the u and v pixels lie between the y pixels on every other pixel of the hori- zontal line, as shown in figure 14-5 . 14.4.1.4 yuv 4:1:1 co-sited in the yuv 4:1:1 co-sited format, the u and v pixels co- incide with the y pixel on every fourth pixel, as shown in figure 14-6 .
pnx1300/01/02/11 data book philips semiconductors 14-4 preliminary specification figure 14-3. 4:2:2 co-sited input format chrominance (u,v) samples luminance samples figure 14-4. 4:2:2 interspersed input format chrominance (u,v) samples luminance sample s figure 14-5. 4:2:0 xy interspersed input format chrominance (u,v) samples luminance samples figure 14-6. 525-60 yuv 4:1:1 co-sited input format chrominance (u,v) samples luminance samples
philips semiconductors image coprocessor preliminary specification 14-5 14.4.2 image overlay formats the icp accepts image overlay data in three formats, rgb 24+ , rgb 15+ , and yuv-4:2:2+ as shown in table 14-1 . the overlay image format must be the same type as the output image format generated by the icp for the main image. for example, if the output image is one of the rgb formats, the overlay must be one of the two rgb overlay formats, rgb-24- and rgb-15+ . if the output image format is yuv, the overlay format must be in yuv-4:2:2+ format. the formats must be of the same type because the icp does no conversion on the overlay data. in rgb 24+ , pixels are packed 1 pixel/word , a full byte of alpha information (stored in the most significant byte) is included with each pixel. in rgb 15+ , one bit of alpha is included for each pixel. th e pixels in the overlay image are packed as 2 pixels per 32-bit word, and the alpha bit is the most significant bit of each half word. in the same manner, the yuv-4:2:2+ format packs two pixels into one 32-bit word, and has one bit of alpha for each pixel. the least significant bit of the u and v components sup- plies the alpha bit for the y0 and y1 pixels, respectively. the alpha bit in these formats selects between two alpha values stored in the icp, alpha 1 and alpha 0. the alpha 1 and alpha 0 values are loaded from the parameter block when the icp is started. 14.4.3 alpha blending codes image overlay uses alpha blending, which combines the overlay image with the main image according to the al- pha value. the alpha value is supplied by the alpha byte in rgb 24+ format and by the alpha registers, alpha 0 and alpha 1 in the other formats. the alpha code format is shown in table 14-2 . 14.4.4 output formats the output formats are the rgb image formats sent to the pci interface or sdram. these formats are shown in table 14-3 . note: b1 = byte 1 of blue = [b7...b0] 1 . table 14-1. image overlay formats format bits 31-24 bits 23-16 bits 15-8 bits 7-0 rgb 24+ a7 - a0 r7 - r0 g7 - g0 b7 - b0 yuv-4:2:2+ y1 (v7-v1) + y0 (u7-u1) + pixel 1 pixel 0 rgb 15+ r4 r3 r2 r1 r0 g4 g3 g2 g1 g0 b4 b3 b2 b1 b0 r4 r3 r2 r1 r0 g4 g3 g2 g1 g0 b4 b3 b2 b1 b0 table 14-2. alpha blending codes alpha code alpha value image overlay 00h 0 100% 0% 20h 32 75% 25% 40h 64 50% 50% 60h 96 25% 75% 80h - ffh 128-255 0% 100% table 14-3. output data formats format word bits 31-24 bits 23-16 bits 15-8 bits 7-0 pixel 3 pixel 2 pixel 1 pixel 0 rgb 8a: 233 1 r1 r0 g2 g1 g0 b2 b1 b0 r1 r0 g2 g1 g0 b2 b1 b0 r1 r0 g2 g1 g0 b2 b1 b0 r1 r0 g2 g1 g0 b2 b1 b0 rgb 8r: 332 1 r2 r1 r0 g2 g1 g0 b1 b0 r2 r1 r0 g2 g1 g0 b1 b0 r2 r1 r0 g2 g1 g0 b1 b0 r2 r1 r0 g2 g1 g0 b1 b0 pixel 1 pixel 0 rgb 15+ 1 r4 r3 r2 r1 r0 g4 g3 g2 g1 g0 b4 b3 b2 b1 b0 r4 r3 r2 r1 r0 g4 g3 g2 g1 g0 b4 b3 b2 b1 b0 rgb-16 1 r4 r3 r2 r1 r0 g5 g4 g3 g2 g1 g0 b4 b3 b2 b1 b0 r4 r3 r2 r1 r0 g5 g4 g3 g2 g1 g0 b4 b3 b2 b1 b0 1 pixel/word rgb 24+ 1 a7 - a0 r7 - r0 g7 - g0 b7 - b0 packed 4 pixels/3 words rgb 24-packed 1 b1 r0 g0 b0 2g2 b2 r1 g1 3r3 g3 b3 r2 packed 2 pixels/word yuv- 4:2:2 1 y1 v0 y0 u0
pnx1300/01/02/11 data book philips semiconductors 14-6 preliminary specification 14.5 algorithms 14.5.1 introduction the icp provides filtering, resizing (scaling) and yuv to rgb conversion of the source image. filtering provides image enhancement. scaling generates a new image that is larger or smaller than the current image. yuv to rgb conversion is used to generate an rgb version of the image for output to an rgb format frame buffer through the pci interface or to sdram. the filtering, scaling, and yuv to rgb conversion algo- rithms are discussed separatel y. the icp uses these al- gorithms in two ways. 1. it provides one pass horizontal scaling with horizontal 5-tap filtering of y, u, or v. 2. it provides one pass vertical scaling with vertical 5-tap filtering of y, u, or v. 14.5.2 filtering the icp provides high qualit y, 5-tap polyphase filtering, both horizontal and vertical, of y, u, or v data. each filter type is performed as a separate one dimensional filter pass. two dimensional filtering of the image requires two passes of the one dimensional filters. multi-tap fir filtering in multi-tap fir filtering of an image, the new filter output (pixel) value is a weighted sum of adjacent pixels. the weighting coefficients determine the type of filtering used. a 5-tap filter generates the new pixel value as a weighted sum of the current value and the two pixels on either side (2 left and 2 right for horizontal filtering, 2 above and 2 below for vertical). a multi-tap fir filter can be used to generate values for new pixels that are displaced from the original (?center?) pixel in the same way as linear interpolation. for exam- ple, assume the new pixel location is shifted slightly to the right of the center pixel of the input image. a horizon- tal filter can be used to estimate the new pixel value by weighting the right pixel filt er coefficients more heavily than the left, proportional to th e relative position offset of the new pixel. (in this sense, interpolation is a 2-tap fil- ter.) this is shown in figure 14-7 . the icp horizontal and vertical filter operations use this method to combine scal- ing with filtering. mirroring pixels at the start and end of a line or window a line may start and/or end at the edge of the input im- age. in this case, the two start and/or end pixels needed for the first and last pixels of the line, respectively, are missing. the icp uses pixel mirroring to solve this prob- lem. in pixel mirroring, the two available pixels are used to substitute the two missing pixels. the first pixel, uses copies of the two pixels to the right as though they were the two pixels to the left. sp ecifically, p+2 substitutes for p-2, and p+1 substitutes for p-1. the last pixel uses cop- ies of the two pixels to the left as though they were the two pixels to the right. since the left and right pixels are now the same, this is called pixel mirroring. there are five states of pixel mirroring: first output pixel, second output pixel, middle pi xels, next to last output pix- el and last output pixel. the first output pixel uses pixels numbered (2,1,0,1,2 ). the second pixel uses (1,0,1,2,3). the middle pixels use (p-2, p-1, p, p+1, p+2). the next to last pixel uses (n-3, n-2, n-1,n, n-1), where n is the number of the last input pixel. the last pixel uses (n-2, n-1, n, n- 1, n-2). in some cases of upscaling, one more input pixel may be needed at the end of the line. in these cases, the pixel value(s) are not generated by the mirror logic. instead, the icp uses a copy of the last output pixel as the best estimate of the required output pixel. 14.5.3 scaling scaling overview resizing, or scaling, the image means generating a new image that is larger or smaller than the original. the new image will have a larger or sma ller number of pixels in the horizontal and/or vertical directions than the original im- age. a larger image is scaling up (more new pixels); a smaller image is scaling down (fewer newer pixels). a simple case is a 2:1 increas e or decrease in size. a 2:1 decrease could be done by throwing away every other pixel (although this simple me thod results in poor image quality). a 2:1 increase is more interesting. the new pix- els can be generated in between the old ones by: 1. duplicating the original pixels 2. linear interpolation, where the new in-between pixels are the weighted average of the adjacent input pixels input pixels output pixels filter (uses 5 input pixels) interpolation (uses 2 input pixels) figure 14-7. pixel generation by interpolation and filtering
philips semiconductors image coprocessor preliminary specification 14-7 3. multi-tap filtering, where the new in-between pixels are multi-pixel filtered version of the adjacent input pixels. this approach results in the best image. the more general case is where the output image reso- lution is not an inte gral multiple or sub -multiple of the in- put image resolution, such as converting from 640 x 480 to 1024 x 768. in this case, the output pixels have differ- ing positions relative to the input pixels in the horizontal or vertical dimensions. in converting from 640 to 1024, the first output pixel on a line corresponds to the first in- put pixel. the second output pixel is at 640/1024 of the distance between the first and second input pixels. the third output pixel is at (2*640)/1024 of the distance = 1280/1024 = 1+ 256/1024 = 256/1024 of the distance be- tween the second and third input pixels, etc. the output pixels shift with respect to the input pixel grid as you move along the line in the horizontal or vertical dimen- sions. this is shown in figure 14-8 . new pixels are generated by interpolation or filtering of the original pixels. interpolation is the weighted average of the input pixels adjacent to the output pixel. filtering extends interpolation to include input pixels beyond the input pair adjacent to the output pixel. the number of pix- els used to generate the output defines the filter type. in- terpolation is a 2-tap filter. a 4-tap filter would use the two pixels to the left and the two pixels to the right of the out- put pixel. a 5-tap filter identifies the single pixel nearest the output as the center pixe l, and uses this pixel plus two to the left and two to the right to generate the output. if the ratio of the output pixel count per line (in h or v) to input pixel count per line is the ratio of sm all integers, there is a repeating pattern in these relative positions of input to output pixel locations. for example, for 640 to 1024, the ratio is 8/5. the pattern repeats for every 8 out- put and every 5 input pixels. if the ratio is not a ratio of small integers, the pattern will take a long time to repeat. the worst case would be 640 to 641, for example. there would be no exact repetition for the whole line. the interpolator or filter coefficients must be weighted according to the relative position of the new pixel relative to the old pixels. the weighting factor is between 0.0 and 1.0, corresponding to the relative position of the new pix- el with respect to the old pixe l grid. with a repeating pat- tern, fewer weighting factors are needed, and therefore fewer coefficients in the linear interpolator or filter gener- ating the new pixels, since you can reuse them each time the pattern repeats. a filter with a repeating pattern is called polyphase, indicating a repeating pattern in the phase (offset position) of the output pixels relative to the input pixels. generating the output pixels: rela ting the output grid to the input grid scaling is a pixel transformation in which an array of out- put pixels is generated from an array of input pixels. the value of each pixel on the output pixel grid is calculated from the values of its adjacent pixels on the input grid. to find these adjacent pixels, y ou overlay the output grid on the input grid and align the st arting pixels, x0y0, of the two grids. to identify the adjacent input pixels for a given output pixel, you divide the output pixel x (pixel number along the output line) and y (pixel line number within win- dow) by their corresponding scaling factors: x in = x out / (horizontal scaling factor) where: horizontal scaling factor = output length / input length y in = y out / (vertical scaling factor) where: vertical scaling factor = output height / input height note that the resulting x in and y in values will be real numbers because the output pixels will usually fall be- tween the input pixels. the fractional portion indicates the fractional distance to the next pixel. to calculate the output pixel value, you use the value for the nearest pixel to the left and above and combine it with the value of the other adjacent pixel(s). for example, horizontal interpo- lation uses the starting pixe l to the left interpolated with the next pixel to the right, with the fractional value used to determine the weightin g for the interpolation. icp scaling output resolution in the icp, scaling is forced to have a repeating pattern by limiting the resolution of the new pixel position to 1/32; the new position is forced to be at a location n/32 in h and v relative to the position of the original pixel grid. this results in a worst case error of approximately 1.5% in amplitude relative to calculations using exact output pixel positions. this is comparable to the errors caused by quantizing the amplitude of the pixels. the additional quantization noise can be avoided by choosing an appro- priate scale factor which, when inverted, resu lts in frac- tional values which are expressed in 32 nd s, such as the 8/5 scaling factor in the 640 to 1024 example above. a diagram of the input to output pixel relationship and the 1 2 3451 1 8 7 6 5 4 3 2 1 input pixels output pixels figure 14-8. 640 to 1024 upscaling example
pnx1300/01/02/11 data book philips semiconductors 14-8 preliminary specification output fractional x and y subpixel offset is shown in figure 14-9 . output scaling calculation method the output pixel distance in h and v in the icp is calcu- lated to high precision (16-bit fraction) even though the output resolution is fixed at 1/32 of the input grid. each output pixel?s location relative to the input pixel grid is giv- en by: x location of output pixel = x0 of input line + output pixel number / x scale factor y location of output pixel = y0 of input window + output line number / y scale factor the x and y locations may not be integer values, de- pending on the scale factor. the resulting x and y pixel locations can be separated into an integer and a fraction- al part. the integer part of the x and y location selects the pixel and line number closest to the output pixel, re- spectively. the fractional part gives the fractional dis- tance of the output pixel to the next x and y input pixel values. these fractional parts are the dx and dy values shown in figure 14-9 . the output pixel value can be calculated by interpolation between the two input pixels or by 5-tap filtering using the 5 nearest pixels rather than the 2 nearest pixels. interpo- lation or filtering uses the fractional position values, ? x and ? y, to select the appropriate filter coefficients. in the icp, these values are limited to 5 bits for a resolution of 1/32, even though the actual position value has much higher resolution. the icp uses fractional values cen- tered around the center pixel with a range of -16/32 to +15/32. to perform scaling, the x a nd y locations of the output pixel relative to the input pixel grid must be generated. this includes both the integer part to locate the adjacent pixels and the fractional part to choose the filter coeffi- cients which generate the output value from the adjacent pixels. this could be done by generating the output pixel x and y numbers and dividing each by its associated scale factor. since dividing is expensive in hardware and time, the icp effectively multiplies the x and y pixel num- bers by the inverse of the x and y scaling factors, resp. this is done by incrementing the x and y input pixel counters by x and y increment values that are the in- verse of the x and y scale factors, resp. for output pixel xn, the inverse of the scale factor is added to the x input location n times. this is equi valent to multiplying n by the inverse of the scale factor. the icp uses a 16-bit integer and a16-bit fractional value for the x and y increment valu es. this allows a fractional value resolution of 1/64k. since the increment value will be added 1024 times in a 1024-pixel line, any error in an individual calculation will be multiplied by 1024. the high resolution of the calculation prevents an accumulation of error as you increment along the line. only the most significant 5 bits of the fractional value are used by the filter coeffici ent rams. howeve r, the x and y counters are incremented by the high-resolution x and y increment values. the result of this truncation is a worst case error of approximately 1.5% in amplitude rel- ative to arbitrary pixel output positions. the error caused by discrete (1/32) resolution can be re- duced to exactly zero if the output image size is adjusted to have a repeating pattern that fits on these 1/32 bound- aries. for zero error, this implies that the scaling factor must be of the form of b/a, where b (the output pixel count factor) is a sub-multiple of 32 [i.e. 1, 2, 4, 8, 16, 32], and a (the input pixel count factor) is an integer deter- mined by the nearest acceptable scale factor for a given b. in the 640 to 1024 conversion case, the b/a ratio was 8/5, meeting this requirement. the integer values, if accumulated, would be equal to the total number of input pixe ls when scaling is complete. the integer values for each pixel define the number of pixels to read from memory and shift in to generate the next output pixel. for exampl e, a scaling factor of 1.0 will result in one pixel shifted in for each output pixel gener- ated. upscaling will have in teger increment values of less than one. this means th at the integer value will be ?0? for some pixels and ?1? for others. for example, up- scaling by 2.0 will result in in teger values of ?1? half the time and ?0? for the other half, depending on the carry out from the fractional increment. pixel shift bypassing for large down scaling down scaling will have integer increment values of great- er than one. in this case, th e integer value indicates the number of pixels to read to obtain filter pixels for the next output pixels. there are two ways to read and shift in the pixels for down scaling: shif t all and shift bypass. in the shift all mode (the default mode ) all five pixels are shifted for each input value read and shifted in. shift all mode uses the five input pixels ne arest the output pixel, inde- pendent of scaling factor. in the shift bypass case, only the last pixel is shifted in. for example, in a down scaling of 10, nine pixels are read and the 10th pixel is shifted in to the filter. shift bypass mode is used for large down scaling, i.e. down scaling factors of 2.0 or greater. the shift bypass mode is selected by setting the getb bit in the parameter table. it uses input pixels that are nearest the output pixel and those neare st each of the four output figure 14-9. icp 1/32 output resolution 1 2 input pixels output pixels dy dx
philips semiconductors image coprocessor preliminary specification 14-9 pixels adjacent to the output pixel. the shift bypass mode also forces the coefficien t ram inputs to ?0?, since interpolation between adjacent input pixels is no longer being performed. using scaling to convert from yuv 4:2:0 to yuv 4:2:2 yuv information in the 4:2:0 format has the uv pixels off- set from the input grid in both x and y. also, the u and v pixels are at 1/2 of the hori zontal and 1/2 of the vertical frequencies of the y pixels. this means the uv pixels must be filtered and additionally scaled in both x and y in order to line up with the output y pixels even if no initial scaling is done. to generat e 4:2:2 interspersed data, vertically up-scale u and v by a factor of 2 with a start off- set of -1/4 pixel. upscaling by 2 generates the additional lines required, and starting with a -1/4 pixel offset (rela- tive to u, v space) moves the output up to the same line as the y pixels. to generate 4:2:2 co-sited, then filter hor- izontally with no scaling factor but with a start offset of - 1/4 pixel, moving the output left 1/4 pixel. 14.5.4 yuv to rgb conversion in the icp, yuv to rgb conversion is done by sequen- tially processing triplets of y, u, and v pixel data to con- vert the pixels to an internal yuv 4:4:4 format and apply- ing the yuv to rgb conversion algorithm on the yuv 4:4:4 pixels. the results of this conversion normally go to the pci bus but can also go back to sdram. yuv to rgb conversion has two steps. first the y, u and a v pixel data are used to generate an rgb pixel at the output location. when the y,u, and v pixels are ready, yuv to rgb conversion is performed using the following algorithms: r = y + 1.375(v)= y + (1 + 3/8)(v) g = y - 0.34375(u) - 0.703125(v) = y - (11/32)(u) - (45/64)(v) b = y + 1.734375(u) = y + (1 + 47/64)(u) in ccir601, the u and v values are offset by +128 by in- verting the most significant bit of the 8-bit byte. this is the way the u and v values are stored in sdram. the above algorithms assume that the u and v values are convert- ed back to normal signed two?s complement values by in- verting the msb before being used. 14.5.5 overlay and alpha blending the icp can add an overlay image to the main image when in the horizontal filter to rgb/yuv mode with pci output. the overlay image is a user-defined rectangle within the main image. when the overlay is active, each overlay pixel is combined with each main image pixel to generate the resulting pixel to be displayed. each pixel combination is controlled by an alpha value which deter- mines the proportions of overlay and main image that contribute to the output pixe l. the relation is given by: pout = (alpha) * poverlay + (1-alpha) * pmain = (alpha) * (poverlay-pmain) + pmain where: alpha ranges from 0 to 1 in the icp, the alpha value range is limited by the hard- ware to five values: {0.0, 0.25, 0.50, 0.75, 1.0}. an alpha value is supplied for each overlay pixel. in the rgb 24+ overlay data format: an 8-bit alpha value is contained within the overlay data. in all other overlay data formats (rgb 15+ , etc.), an al- pha bit in the overlay data determines the alpha value. the alpha bit selects between two 8-bit values, alpha 1 and alpha 0, supplied by a pair of internal icp registers. these registers are loaded from the parameter block when the icp is started. when the alpha bit is ?1?, alpha 1 value is used as the alpha value; when the alpha bit is ?0?, alpha 0 is used as the alpha value. the two alpha reg- isters allow translucent images and backgrounds while being restricted to one bit per pixel for alpha selection. alpha blending has several uses. 1. alpha can be used to disable portions of the overlay, called keying. when the alpha for a pixel is ?0?, there is no overlay. when the alpha is ?1?, the overlay is 100%, replacing the image. this allows the user to put an irregular shaped object in an image without show- ing the bounding rectangle of the overlay. 2. alpha blending allows translucent (smoky) back- grounds and/or translucent (ghostly) overlay images 3. using alpha at the edges of small images such as font characters increases their effective visual resolution. chroma keying the icp also optionally provides a restricted form of chroma keying sometimes ca lled color keying. when the overlay y value is ?0? (an ille gal value in the yuv 4:2:2+ format) or the rgb values are all ?0? (rgb15+ format), the alpha value is forced to ?0? and no overlay or blending occurs. this provides three levels of overlay: none, alpha zero, and alpha one. this combination can be used to generate an irregularly shaped menu (an oval shape, for example) which is translucent (e.g. an alpha value of 50%) that contains opaque (alpha = 100%) letters. in a game, this could be a message written on a foggy back- ground in an oval window. the chroma keying provides the definition of the oval s hape, the alpha zero value de- fines the translucent foggy background and the alpha one value defines the opaque characters on the foggy background. chroma keying in the icp is intended for computer gen- erated or modified overlays. chroma keying turns off the overlay process for selected pixels by forcing an alpha value of ?0? for those pixels. chroma keyed pixels use special codes to identify them. these codes must be computer generated in most cases. for example, the dspcpu or other cpu would process an overlay image and convert the overlay pixels to be turned off into chro- ma keyed pixels by changing the data for those pixels to the chroma key code. the icp does not have full chroma keying. full chroma keying has adjustable threshold values for the pixel com- ponents. adjustable thresholds allow the user to auto- matically select an overlay sub-image from a larger over- lay background, such as selecting an image of an actor
pnx1300/01/02/11 data book philips semiconductors 14-10 preliminary specification against a bright blue background while inhibiting the blue background. 14.5.6 dithering short output codes, such as rgb 8, have few bits for out- put-value determination. rgb 8r has (2,3,3) bits for (r,g,b). the result is a coarse, patchy image if nothing is done to correct for the lim ited resolution. dithering sig- nificantly improves the effect ive resolution of these imag- es. for example, rgb 8 images dithering looks nearly as good as rgb 16. dithering works by adding a random dithering value to the pixel before it is truncated by the output formatter. the dither is added to the portion which will be truncated. the carry from this add will occasionally propagate into the most significant portion of the pixel before truncation. the carry from the add thus ?dithers? the displayed val- ue.in the example shown in figure 14-10 , a random dith- er value is added to the original data before truncation. the dither value should have a range of from approxi- mately 0 to 1 lsb of the truncated value. the dither value should be symmetrical around 1/2 the lsb of the quan- tizing error of the truncation. in the example shown, the dither signal has values of (1/8, 3/8, 5/8, 7/8). this set of values has a range of approximately 0 to 1 lsb, and it is symmetrical around 1/2 lsb. in this example, the input signal has a value of 2.83. without dithering, this val ue would be truncated to an output value of 2 in all cases. averaging the un-dithered signal over four pixels still gi ves you a value of 2. by add- ing the dither signal, the out put value is 2 or 3 depending on the value of the added dither signal. averaging over four pixels, the average output value is 2.75, much closer to the input value than without the dither signal. the dith- er signal has significantly reduced the error when aver- aged over four pixels. two types of dithering are co mbined in the icp: quad pix- el and full image dithering. quad pixel dithering, also known as ordered dithering, adds one of four dithering values to each pixel. the four dithering values corre- spond to four-pixel quads in the output image. the pixels in each quad have fixed positions in the input image, so the dither values are chosen on the bases of odd or even line number and odd or even pixel number in the line. the dither values of (0/4, 3/4, 2/4, 1/4) are added by line and pixel number: even line & even pixel, even line & odd pixel, odd line & even pixel, odd line & odd pixel. this gives a four value ordered function for four adjacent pix- els in the image. the (0,3,2,1) pattern is chosen specifi- cally to prevent pairs of high or low pixel values from clustering. spatial dithering provides a significant im- provement in effective resolution. full image dithering adds a single randomly generated number to every pixel of the image. the result is that the intensity and color accuracy in creases as the size of the sample is enlarged. the random number has a long bit length to prevent repeating patterns in the image. the random number can be static or dynamic. in the static case, the random number generator starts with a fixed seed at the start of the image. the random number spa- tial pattern is fixed for the image even though the image data may change from frame to frame. in the dynamic case, the random number generator runs continuously, and the dithering pattern changes from frame to frame. the icp combines quad pixel dithering with full image dithering to provide the final dithering signal for each pix- el. the quad pixel dither provides the two most signifi- cant bits of the dither signal, and the full image dither pro- vides the least significant 4-bi ts of the dither signal. the combined dither signal is 6 bits. from 1 to 6 bits of dither signal are used, depending on the output format. if fewer than 6 bits are needed, only the msbs of the dither sign al are used. for example in the rgb 8r output format, the r output value is 3 bits in size. the output uses the 3 msbs of the r input value and truncates the 5 lsbs. the dither unit adds 5 bits of dither signal (the 5 msbs) to the 5 lsbs of the r input value before truncation, and the rgb formatter truncates the result after adding. 0 1 2 3 2.830 dither = 0 output = 2 0 1 2 3 2.955 dither = 1/8 output = 2 0 1 2 3 3.205 dither = 3/8 output = 3 0 1 2 3 3.455 dither = 5/8 output = 3 0 1 2 3 3.705 dither = 7/8 output = 3 no dithering: output = 2.0 1/4 lsb dithering output = (2+3+3+3)/4 = 11/4 = 2.750 error = +0.830 no dithering 1/4 lsb dithering error =(2.830 - 2.750) = +0.080 figure 14-10. dithering
philips semiconductors image coprocessor preliminary speci fication 14-11 14.5.7 implementation overview: horizontal scaling and filtering figure 14-11 shows a data flow block diagram of the icp horizontal scaling algorith m implementation. blocks of pixels are provided by the i nput block buffer. each block of pixels is transferred sequentially to the 5-tap filter. the filter does scaling and filteri ng of the data and puts the re- sulting pixels in the output bu ffer. completed pixels in the output buffer are written back to sdram or to the pci output. a bypass multiplexer a llows the filter to be by- passed for sdram to sdram block moves. input pixel access is controlled by the y counter. the y counter selects the word and byte for the current pixel in the y fifo buffer. the y increment register, y lsb reg- ister and the y msb counter control the increment of the y counter. if the y msb counter contents is not ?0?, the y counter is incremented and the y msb register is dec- remented until the y msb counter is ?0?. the y msb counter is loaded with the integer portion of the results of the y counter increment operation. y counter increment involves adding the y increment frac- tion and integer values to the y lsb register and y msb counter, respectively. if there is no scaling (scaling fac- tor = 1.0), the y increment in teger value will be ?1?, and the y increment fractional value will be ?0?. each y counter incremen t operation will increment the y counter by one in this case. the y counter keeps track of horizontally indexed pixels sent to the filter. the y counter is incremented once (1.0 for no scaling) for each pixel. for a line of pixels begin- ning with x a and ending with x b , the y counter reads pix- els from the block buffer beginning with x a-2 and ending with x b+2 . the extra pixels are required by the 5-tap filter, which uses a total of 5 pixels to generate each output pix- el, two pixels before and two pixels after each pixel. the horizontal filter uses the current output from the block buffer and four delayed versions of it to generate the filter output as the weighted sum of the center pixel plus the two on either side. (for the case where the scaling factor = 1.0, the lsbs are always ?0?.) for up or down scaling, the y increment value is not 1.0, it is the inverse of the scaling factor (see ?icp scaling output resolution,? on page 14-7 ). for up scaling by a factor of 2.0, the effective y increment value is 0.5, for example. this means two out put pixels are generated for each input pixel. the y coun ter effectively increments as 0.0, 0.5, 1.0, 1.5, 2.0, etc. the lsbs of the counter (i.e. the fractional part less than 1) in the y lsb register are used by to the filter to generate the intermediate values. an lsb value of 0.5 indicates that the output pixel is half way between x n and x n+1 . the filter contains a set of 5 filter parameter rams, one for each coefficient. the 5 most significant lsbs from the counter select the filter coefficients which will generat e the correct value for the output pixel at the relative of fset from 0.0 indicated by the lsbs. sdram to s d r a m y msb cntr pixel clock 5 stage multipli- er-accumulator y lsbs reg reg reg reg pixel data a +2 ram a +1 ram a +0 ram a -1 ram a -2 ram z counter mux bypass bypass sdram address block y counter y incr fraction y lsb reg carry out filter source select 5-tap filter yuv code delay y incr integer n byte incr figure 14-11. icp horizontal scaling data flow block diagram output buffers 6,7 block fifo buffers 0,1 block fifo via highway or pci
pnx1300/01/02/11 data book philips semiconductors 14-12 preliminary specification the y counter indicates the next pixel from the input buffer. a new pixel is clocked into the filter registers only when the y counter contents change, which happens when the y msb counter is loaded with a value greater than ?0?. note that for y increment values less than 1.0 (up scaling), the change will be caused by carry incre- ment from the y lsbs, and a new pixel will not be clocked into the filter shift register on every y clock. for increment values of 2.0 or for values of 1.0 or greater with carry in (down scaling) , multiple new pixels will be clocked into the filter shift re gister before the filter inputs are ready. the number of new bytes needed for the next pixel is the sum of the y in crement integer value and the carry out of the y lsb adder. this result is loaded into the y msb counter. the filter clock is stalled until the in- puts are ready. the integer value of the increment -- in- cluding carry -- defines the number of new pixels to be clocked through the shift regist er before the filter inputs are ready for use. in this discussion, the y counter lsbs form a 16-bit bi- nary number. the upper 5 bits of this 16-bit number form a 5-bit binary number between 0 and 31 representing a fractional distance between y pixels between 0/32 and 32/31. if the new pixel relati ve distance is 31/32, it is nearest the right pixel of the tw o pixels it is between, and the right 2 pixels will be more heavily weighted than the left 3. the horizontal filter shown in figure 14-11 is pipelined to generate a pixel for every integer increment of the y counter. the filter input is always 5 clocks ahead of its output. the first stage gener ates the filter term a n+2 x n+2 using the data from the input block and the a n+2 coeffi- cient from the coefficient ram driven by the y lsbs. the second stage registers hold the data for x n+1 and its cor- responding y lsbs and generate a n+1 x n+1 . the last stage registers hold the data for x n-2 and the x n-2 lsbs and generate a n-2 x n-2 . the lsb register contents can change on every clock. in the 2:1 scaling example, the lsbs alternated between 0.0 and 0.5. the lsb counter represents each output pixel?s x offset value from the input pixel grid. the lsb in- crement value is 16 bits long. the 5 upper bits go to the coefficient rams, and the 11 lower bits provide precision increment of the lsb counter for precision in represent- ing the scaling factor. the 11 lower bits of the lsb incre- ment value added to the 11 lower bits of the lsb counter determine when to increment the 5 lsbs that drive the coefficient rams and when to clock a new y pixel into the filter. 14.5.7.1 loading the extra pixels in the filter for a 5-tap filter, 4 more pixel inputs are needed to the filter than are generated at the filter output, two before the first pixel and two after the last pixel. in the worst case of a window that is exactly n blocks wide and starts at the first pixel of the firs t block, two extra blocks must be read - one at each end of the window - in order to get these 4 pixels! this is an unavoidable problem with a multi-tap filter. for an n-tap filter, n-1 extra pixels are needed. there are two techniq ues that avoid this effi- ciency hit of fetching extra blocks. 1. move the window edges so they are not within 2 pix- els of a 64 input pixel boundary. 2. simulate the edge pixels, such as by mirroring the pair of pixels you have on the other side. this is the only solution to the problem of starting (or ending) at the edge of the image, where there are no pixels to the left (or right) of the image window. the icp uses automatic mirror ing to supply these pixels. mirroring is used in both horizontal and vertical filter modes. 14.5.7.2 mirroring pixels at the ends of a line a line may start and/or end at the edge of the input im- age. in this case, the two start and/or end pixels needed for the first and last pixels of the line, respectively, are missing. the start mirror uses the two pixels to the right of the first pixel, and the end mirror uses the two pixels to the right of the last pixel. these pixels are supplied by controlling the y counter. a mirror multiplexer in the 5- tap filter provides mirroring of one or two pixels at the f ilter inputs. this mirror multi- plexer is used for both horizont al and vertical filtering. in horizontal filtering, the first and last two pixels in the line are mirrored. the mirror multiplexer is set to the appro- priate mirror code for the first and last two pixels in the line. the first two pixels are mirrored for the first two clock pulses, and the last two pixels are detected using the pix- el counter for the line. mirroring is optional, depending on whether the start or end of the line is on a wi ndow boundary. the dspcpu or microprogram must detect this and enable start and/or end mirroring as required. 14.5.7.3 horizontal filter sdram timing figure 14-13 shows a timing diagram for block data flow between the sdram and the filter for a scaling factor of 1.0. the bus block reads and wr ites are one fourth of the filter processing time becaus e the filter processes data at 100 mpix/sec, and the sdram reads and writes blocks of pixels at 400 mpix/sec . the sdram logic reads the next block while the current block is being processed. this also provides the two pi xels from the next block re- quired to finish filter ing the current block. if the scaling factor is grea ter or less than 1.0. the sdram bus activity will be different. for scaling factors greater than 1.0, there will be fewer sdram reads for the same number of writes generated by the filter. for exam- ple, a scale factor of 2.0 me ans that it is necessary to read only half as many blocks to generate the same num- ber of output blocks. for a scale factor less than one, there will be more reads for th e same number of writes. for a scale factor of 0.5, tw o blocks must be read for ev- ery block of output. if the scale factor is less than 1/3, more time will be spent read ing and writing sdram than filtering.
philips semiconductors image coprocessor preliminary speci fication 14-13 14.5.8 implementation overview: vertical scaling and filtering figure 14-14 shows a data flow block diagram of the icp vertical scaling algorithm im plementation. blocks of pix- els are loaded sequentially into five input block buffers, one for each of the 5 terms of the 5-tap filter. each block of pixels is transferred sequentially to the 5-tap filter. the filter does scaling and filteri ng of the data and puts the re- sulting pixels in the output bu ffer. completed pixels in the output buffer are written back to sdram. in vertical scaling, five separate blocks of pixels, one for each line, are required because the pixels are stored in horizontal sequence in the sdram. the y counter steps through the 64 horizontal pixels of the five input blocks and writes the resulting pixels into the output block. four of the five blocks are used on the next pass, so that one block of pixels in generates one block of pixels out except for end conditions. the image is processed in 64-pixel columns. since the image to be filtered will not generally start or end on a block boundary, the number of horizon- tal pixels for the first and last columns will be less than 64 in these cases. also, the data in the columns must be aligned vertically. this result s in the requirement that the line-to-line address offset va lue must be a multiple of 64 bytes. note that only the address offset value is modulo 64; the image to be filtered can start and stop anywhere. block alignment is not required. vertical scaling and filtering processes five 64-pixel input line segments to generate one 64-pixel output segment. when input lines y n-2 to y n+2 have been processed to generate one 64-pixel output segment for output line y n , five new input segments are needed for the next output line segment in the 64-pixel column, y n+1 . if the vertical scale factor is 1.0 (no scaling), line segments y n-1 to y n+2 are reused, a new block for y n+3 is loaded and the block for line y n-2 is discarded. to load y n+3 , the mcu adds the y offset value to the block address (upper 26 bits) of the y counter, and the y counter selects the next y block to be read from sdram. the y counter points to the line block address for last y block loaded, and the y offset value is the ad- dress difference between the start of one line and the start of the next, x0y0 to x0y1. the line offset is always an integral number of sdram blocks. the line offset val- ue must be added to the current line address to get the next line address. up and down scaling use the u counter and u increment value. the u counter is used to detect how many lines must be read (0 to 5) to generate the next output line and to generate the vertical offset fraction for the 5-tap filter for output lines that fall between the input lines. the u counter is set to its starting value (typically ?0?) at the start of the column, and the u increment value is added to the u counter for each output line segment generated in the column. for a scaling factor of 1.0, the u increment value is 1.0, and each line processed will g enerate a re- quest for one block. if the scalin g factor is 1/2, the incre- ment value will be two, corr esponding to moving down two lines. in this case, twice the line offset is added to the y counter value. for up scaling by a factor of 2.0, the y increment value is 0.5. this means two output lines are generated for each input line. the u coun ter increments as 0. 0, 0.5, 1.0, 1.5, 2.0, etc. the lsbs of the u counter (i.e. the fractional part less than 1) are passed along to the filter to generate the intermediate values. an lsb value of 0.5 means that input pixels: y output pixels: y? 12345 6 y?=f(y3,y2,y1,y2,y3) y?=f(y2,y1y2,y3,y4) y?=f(y1,y2,y3,y4,y5) y?=f(y2,y3,y4,y5,y6) y?=f(y3,y4,y5,y6,y5) 2n: y?=f(y4,y5,y6,y5,y4) (3) (2) (5) (4) mirrored pixels figure 14-12. horizont al pixel mirroring sdram bus filter action read x0 write xa read x1 filter x1 => xb filter x0 => xa read x2 write xb filter x2 => xc read x3 figure 14-13. sdram and horizo ntal filter block timing
pnx1300/01/02/11 data book philips semiconductors 14-14 preliminary specification the output line is half way between y n and y n+1 . the filter contains a set of 5 filter parameter rams, one for each coefficient. the 5 most significant lsbs from the counter select the filter coefficients which will generate the cor- rect value for the output pixel at the relative offset from 0.0 indicated by the lsbs. for down scaling, the increm ent factor will be greater than one. if the increment fa ctor is 2.0, two new blocks will have to be loaded be fore starting the next vertical fil- ter pass. if the increment factor is 5 or greater, all five blocks must be loaded. the number of blocks to be load- ed for the next line is equal to the integer increment value plus carry out from the lsb portion of the u counter in- crement. note that the lsb adder carry out is available before the u counter has been updated. this allows the current u counter value lsb bits to be used for the filter coeffi- cients while using the carry out for the next value to pre- dict how many blocks to fetch. the integer value from the u increment value plus the carry in from the lsb portion of the increment adder is the number of blocks to be loaded. these blocks must be sequentially loaded (and not skipped) so that the filter has the necessary 5 adja- cent lines to perform the filter ing. the contents of the in- teger portion of the u counter (updated after the add) are not used. only one new block can be loaded while the current line is being processed. if two or more blocks are needed to process the next line, load one in overlap. wait until the current line is done, then load the rest of the blocks. the microprogram only has to ma ke two decisions for the next line: is the increment value ?0? or greater than ?0?, and if greater than ?0?, is it gr eater than five. if it is ?0?, do nothing: you will reuse all five bl ocks. if it is 1-4, load the next block. if it is five or more, calculate the address of the first block -- by adding n times the address offset to the y counter -- and fetch it. when a new block is loaded and it is time to process the next line, the block which was y n+2 becomes y n+1 . the y blocks, in effect, shift up one line as you scan down the image. this shifting action is implemented by shifting the block select codes in the f ilter source select register (fssr). the fssr contains six 3-bit register fields. these 3-bit fields are rotated by a shift command to the fssr. the output of five of the fssr fields go to the in- put multiplexer, which select s the next block combination and sends it to the filter. the output of the sixth field is the free block to be filled for the next line while the current line is being processed. the se lect code is also the block code (0 to 5), so the free block is identified by its block code in the fssr. the fssr codes for the six cases of vertical filtering are shown in table 14-4 . sdram to s d r a m output buffers 6,7 block fifo y counter yn+2 buffer 5-tap filter a +2 ram a +1 ram a +0 ram a -1 ram a -2 ram yn+1 buffer yn+0 buffer yn-1 buffer yn-2 buffer u incr integer u lsbs u lsb reg u incr fraction z counter filter source select 6 in x 5 out multiplexer fssr y line clock line clock carry byte index pixel clock block count to microcode u msb cntr block address to sdram output pixel clock figure 14-14. icp vertical scaling data flow block diagram
philips semiconductors image coprocessor preliminary speci fication 14-15 14.5.8.1 mirroring lines at the ends of an image a window may start and/or end at the edge of the input image. in this case, the two start and/or end lines needed for the first and last lines of the window, respectively, are missing. these pixels are supplied by the mirror multi- plexer at the 5-tap filter which mirrors the input lines.the mirror multiplexer is contro lled by the mirror counter and mirror end register in the same manner as in horizontal filtering. the mirror register in vertical filtering is incre- mented by the output line counte r. mirroring is performed on the first two and last two lines of the column. mirroring is optional, depending on whet her the start or end of the line is on a window boundary. the dspcpu or micropro- gram must detect this and enable start and/or end mirror- ing as required. 14.5.8.2 vertical filter sdram block timing figure 14-15 shows a timing diagram for block data flow between the sdram and the filter for a scaling factor of 1.0. the bus block reads and writes require one fourth of the filter processing time be cause the filter processes data at 100 mpix/sec, and the sdram reads and writes blocks of pixels at 400 mpix/sec (peak). the vertical filter starts by reading in the five blocks necessary to generate the next output block. while the current block is being processed, the next block is read from sdram to pre- pare for the next output block. 14.5.9 horizontal scaling and filtering for rgb output figure 14-16 shows a data flow block diagram of the icp horizontal scaling to rgb output algorithm implementa- tion. the six input block buffers are arranged as three block fifos, one each for y, u and v pixel streams. these three streams are sequentially filtered, pixel by pixel by the 5-tap filter to generate a scaled output se- quence of y, u, v, y, u, v, etc. this yuv stream is fed to the yuv to rgb converter wh ere it is converted to one of several rgb output formats, blended with rgb over- lay pixels supplied by the overlay fifo and masked by bit mask pixels from the bit mask block. the resulting scaled, converted, overlay blended and masked rgb stream is sent to the pci inte rface -- typically to an rgb format frame buffer on the pci bus -- or to sdram. the input pixel streams from the input fifos are trans- ferred sequentially to the 5-tap filter. each stream has its own set of four-stage dela y registers used to perform horizontal filtering on the stream. a pair of 3-way multi- plexers switch the five filter data inputs and the 5-bit filter coefficient select codes to the 5-tap filter. this set of mul- tiplexers is driven by the yuv sequence counter, a 2-bit counter that provides th e yuv processing sequence. in horizontal scaling and filtering from sdram to sdram, each y, u and v component is filtered sepa- rately as a complete image. in rgb output horizontal scaling and filtering, the image is processed as three in- terwoven streams of all three yuv components. in the rgb output mode, the icp normally generates rgb data and writes it into a frame buffer memory on the pci bus or to the sdram. the frame buffer memory for- mat is rgb with one r, one g and one b value per pixel. this could be called rgb 4:4:4. to generate this image, the icp generates a yuv 4:4:4 image and converts it to rgb. this process is done one rgb output pixel at a time. the icp generates a u pixel and saves it in a reg- ister, generates a v pixel and saves it in a register, then generates a y pixel for output. the yuv to rgb convert- er combines each y pixel as it is generated with the pre- viously stored u and v pixels to generate the rgb output data. this process is repeated until the whole image has been converted and sent to the pci bus or sdram. 14.5.9.1 yuv sequence counter in yuv 4:2:2 output mode for rgb output formats, t he yuv data must be scaled to yuv 4:4:4 format before c onversion to rgb. the yuv data in sdram is typically stored in yuv 4:2:2. this means that the u and v data must be upscaled by 2 rel- ative to the y data to generate the internal yuv 4:4:4 for- mat required for rgb conversion. for the yuv 4:2:2 output fo rmats, the u and v data do not need to be up scaled to 4:4:4. the yuv 4:4:4 data would be upscaled only to be decimated back to yuv 4:2:2. for yuv 4:2:2 output, the u and v pixels are used twice. this is done by having a half-speed mode for the yuv sequence counter. in this mode, the sequence is u0, v0, y0, y1, u2, v2, y2, y3, etc. the u and v are not table 14-4. fssr codes for vertical filtering. case pn-2 pn-1 pn+0 pn+1 pn+2 io block 154321 0 205432 1 310543 2 421054 3 532105 4 643210 5 sdram bus filter action read y5 write ya read y6 filter y3-6 => yb filter y2-5 => ya read y7 write yb filter y4-7 => yc read y8 figure 14-15. sdram and vertical filter block timing
pnx1300/01/02/11 data book philips semiconductors 14-16 preliminary specification up scaled by 2 relative to the y component for yuv 4:4:4 output, although they could be up scaled as part of gen- eral up scaling of the image. the yuv 4:2:2 output mode also provides higher pro- cessing bandwidth relative to yuv 4:4:4 up scaling. half as many u and v pixels are processed.the output pixel rate is one pixel per 20 nanoseconds for the yuv 4:2:2 output mode versus one pixel per 30 for conversion to yuv 4:4:4. this can be used to provide some processing performance improvement for very large images at the expense of some chroma quality. 14.5.9.2 pci output block timing the icp outputs pixels to th e pci interface at a peak rate of 33 mpix/sec in rgb mode and 50 mpix/second in the yuv mode using yuv sequencing. for one word per pix- el output codes, such as rg b-24, this is a peak rate of 33 mwords/sec or 132 mpix/sec in the rgb sequencing mode. this is the same speed as the 132 mb/sec peak rate of the pci interface. (at 50 mpix/sec, the result would be 200 mb/sec.) the biu control for the pci inter- face has a fifo for buffering data from the icp, but this buffer is only 16 words de ep. therefore, the icp will oc- casionally have to wait for the pci to accept more data. in the pci output mode, this stalls the icp clock. 14.6 operation and programming the icp uses a combination of hardware and a micro- program control unit (mcu) to implement its scaling, fil- tering and conversion functions. the microprogram is a to p c i 5 stage multiplier- accumulator y, u , v l s b s reg a +2 ram a +1 ram a +0 ram a -1 ram a -2 ram y counter y lsb counter buffers 0,1 block fifo filter source select 5-tap filter reg reg reg reg u counter u lsb counter buffers 2,3 block fifo reg reg reg reg v counter v lsb counter buffers 4,5 block fifo reg reg reg ol counter b, bx counter buffer 8 bit mask buffers 6,7 overlay fifo multiplexer: y, u, v select mux yuv to rgb conversion, formatting, alpha blending & bit masking yuv counter sequence pixel clock y, u, v data fifo clocks mirror multiplexer y mirror cntr u mirror cntr v mirror cntr mux rgb to sdram case rgb to pci case figure 14-16. icp horizontal scaling for rgb output data flow block diagram
philips semiconductors image coprocessor preliminary speci fication 14-17 factory-supplied state machine that resides in sdram. it is read each time the icp ex ecutes an operation. using an sdram-resident microprogram-controlled state ma- chine minimizes hardware and provides flexibility in han- dling special conditions without additional hardware. important note: you must set the icp dma enable bit (ie) in the biu_ctl register of the pci interface for rgb output to pci. this bit must be set before initiating rgb to pci operations, or the ic p will stall waiting for the pci to become ready. refer to section 11.6.5, ?biu_ctl register.? 14.6.1 icp register model the icp is controlled by the dspcpu through five mmio registers: the microprogram counter (mpc), the micro instruction register (mir), the data pointer (dp), the data register (dr) and the icp status register (sr), as shown in figure 14-17 . the mpc, dp and sr are used in normal operations, and the mir and dr are used in test and debug. note that the mmio registers should never be written while the icp is executing microcode, i.e test the busy bit in the sr register before writing any icp mmio register. the mpc is the mcu instructio n counter. it points to the next microinstruction to be executed. the entry point in the microprogram defines which icp operation is to be executed.the dp points to the location in sdram of a table of parameters used by the icp to process the im- age data, such as the image input and output start ad- dresses, scaling factor, etc. the sr has 13 active bits: busy (b), done (d), done in- terrupt enable (ie), ack_done (a), little endian (l), step (s), diagnostic (dg), reset (r), priority delay (pd, 4 bits). bits 12 .. 30 are reserved. ? (b)usy indicates the icp is busy executing micro- code. ? (d)one indicates that the previous requested function is complete, and that the icp clock is stopped. ? (d)one causes an interrupt to the dspcpu when interrupt enable is set. ? (a)ck_done clears (d)one and the corresponding interrupt. ? (l)ittle endian sets the highway endian swap multi- plexer to little endian mode for data on the sdram bus. ? (s)tep causes the mcu to execute one microinstruc- tion. step is used for di agnostics to step the icp through its microinstructions one clock step at a time. writing a ?1? to step sets busy, which is reset at the end of execution of the next microinstruction. ? (dg) allows sdram operations in step mode. ? (r) is a write-only bit that resets icp internal regis- ters. ? (pd) sets a timer for bus activity that defines the min- imum bus bandwidth available to the icp. the icp status register co ntains 20 read-only status bits. the upper 16 bits of th e status register can contain a 16-bit code returned by the microprogram upon com- pletion. bits 15 through 12 ar e reserved for error flags. important note: you must set the icp dma enable bit (ie) in the biu_ctl register of the pci interface for rgb output to pci. this bit must be set before initiating rgb to pci operations, or the ic p will stall waiting for the pci to become ready. refer to section 11.6.5, ?biu_ctl register.? 14.6.2 power down the icp block enters in power down state whenever pnx1300 is put in global power down mode. microprogram counter (mpc, icp_mpc) data pointer (dp, icp_dp) icp status (icp_sr) d 1 0 31 31 0 b ie 2 microinstruction register (mir, icp_mir) data register (dr, icp_dr) 3 a l s 4 5 0x10 2400 0x10 2404 0x10 2408 0x10 2410 0x10 2414 mmio offsets priority delay 12 11 6 dg r 7 8 figure 14-17. icp mmio registers 30
pnx1300/01/02/11 data book philips semiconductors 14-18 preliminary specification the icp block can be separately powered down by set- ting a bit in the block_power_down register. refer to chapter 21, ?power management.? it is recommended that icp is in an idle state before block level power down is activated. 14.6.3 icp operation the dspcpu commands the icp to perform an opera- tion by loading the dp with a pointer to a parameter block, loading the mpc with a microprogram start ad- dress and setting busy in the sr. for example to cause the icp to scale and filter an image, set up a block of sdram with the image and filter parameters, load the mpc with the starting address of the appropriate micro- program entry point in sdram, load the dp with the ad- dress of the parameter block, and set busy in the sr by writing a ?1? to it. when the filter operation is complete, the icp will set done and issue an interrupt. the dspcpu clears the interrupt by writing a ?1? to ack_done. note: the interrup t should be set up as a ?level triggered.? when the dspcpu sets busy, the mcu begins reading the microprogram from sdram. the microinstructions are read in from sdram as required by the icp, and in- ternal pre-fetching is used to eliminate delays. setting busy enables the mcu clock, the first block of microin- structions is automatically read in, and the mcu begins instruction execution at the current address in the mpc. clearing busy stops the mcu cl ock. busy can be cleared by hardware reset, by the mcu, or by the dspcpu. hardware reset clears the status register, including busy and done, and internal registers, such as the tcr. when the mcu completes a microprogram operation, the microprogram typically clears busy and sets done, causing an interrupt if ie is enabled. the dspcpu performs a software reset by clearing (writing a ?0? to) busy and by writing a ?1? to reset. the dspcpu can also set done to force a hardware inter- rupt, if desired. 14.6.4 icp microprogram set the icp comes with a factory-generated microprogram set which implements the func tions of the icp. the mi- croprogram set includes the following functions: 1. loading the filter coefficient rams. 2. horizontal scaling and filtering from sdram to sdram of an input image to an output image. the in- put and output images can be of any size and position that fits in sdram. the scaling factors are, in gener- al, limited only by input and output image sizes. 3. vertical scaling and filtering from sdram to sdram of an input image to an output image. the input and output images can be of any size and position that fits in sdram. the scaling factors are, in general, limited only by input and output image sizes. 4. horizontal scaling, filtering and yuv to rgb conver- sion of an input image from sdram to an output im- age to pci or sdram, with an alpha-blended and chroma-keyed rgb overlay and a bit mask. the input and output images can be of any size and position that fit in sdram and can be output to the pci bus or sdram. in general, scaling factors are limited only by input and output image sizes. the microprogram is supplied with the icp as part of the device driver. the entry point in the microprogram de- fines which icp operation is to be done. the entry points are given below in terms of word offsets from the begin- ning of the microprogram: offset function 0 load coefficients 1 horizontal scaling and filtering 2 vertical scaling and filtering 3 horizontal scaling, filtering, yuv to rgb conversion, bit masking (pci) and over- lay (pci) with alpha blending and chroma keying 14.6.5 icp processing time the processing time for typical operations on typical pic- ture sizes has been measured. measurements were performed with the following config- uration: ? cpu clock and sdram clock set to 100 mhz ? pci clock set to 33mhz ? all measurement with pci as pixel destination were done with an imagine 128 series ii graphics card, which never caused a slowdown of the icp opera- tion. ? triton2 mother-board with sb82437ux and sb82371sb based intel ? pentium ? chipset. ? pnx1300 arbiter set to default settings ? pnx1300 latency timer set to maximum value = 0xf8. ? overlay sizes were the same as picture sizes. results are tabulated below for three different cases of available memory bandwidth: 1. no other load to sdra m, i.e. full sdram bandwidth available for icp. see table 14-5 . 2. sdram memory loaded to 95% of its bandwidth by dcache traffic from dspcpu. priority delay = 1, i.e. icp did wait one block time before competing for memo- ry. see table 14-6 . 3. sdram memory loaded to 95% of its bandwidth by dcache traffic from dspcpu. priority delay = 16, i.e. icp did wait 16 block times before competing for memo- ry. see table 14-7 . note: a load of 95% of the memory bandwidth is very rarely found in a real system. so the results in these ta- bles may be useful to estimate upper bounds for the computation time in a loaded system. the priority delays were set to the minimum and maxi- mum possible values, so the computation time for other priority delay values should be somewhere in between.
philips semiconductors image coprocessor preliminary speci fication 14-19 a simple linear model of computation time has been fit- ted to the tabular data and to corresponding measure- ments with half the number of pixels per line. it was assumed that processing time = (time per line start)* (number of lines) +(time per pixel) * (number of pixels) table 14-8 , table 14-9 and table 14-10 give the time per line start and the time per pixel in this equation for the three memory bandwidth cases. the maximum deviation between measured time and fit- ted model is on the order of 10% in the range w = 180 ... 1024, h = 240 ...768. the deviation is much less in most cases. the values were found by least squares fit to the measured data. in some cases the cumulative time for line starts contrib- uted so little to the total computation time that the value per line start could only be determined relatively inaccu- rately. in other words the pixel time portion dominated the equation so much that the line time portion was neg- ligible, given the inaccuracies of the model. therefore the simple model is only thought to allow inter- polation for other picture sizes within the range w = 180 ...1024, h = 240 ... 768. extrapolation to picture sizes much outside this range should not be attempted using this data. in some cases the real icp performance may be much better than that predicted by the model, due to irregular behavior of the icp. for horizontal and vertical up/down-scaling operations use the larger w or h value oc curring at input/output with the h/v filter times table or model. this will lead to overestimation of processing time by up to 20%. table 14-5. measured processing time in ms - no other load to sdram w in pixels 360 640 720 720 800 800 1024 h in pixels 240 480 480 768 480 600 768 horizontal filter, 1 component 1.22 3.82 4.43 7.08 4.78 5.98 9.27 horizontal filter, 3 components yuv 4:2:2 2.68 8.18 9.29 14.86 10.08 12.60 19.35 vertical filter, 1 component 2. 57 8.73 10.24 16.36 11.19 13.97 22.30 vertical filter, 3 components yuv 4: 2:2 5.15 17.47 20.48 32.72 22.95 28.65 44.60 yuv to rgb8a, pci output 3.36 10.74 11.93 19.08 13.04 16.30 26.02 yuv to rgb15a, pci output 3.39 10.79 11.96 19.12 13.10 16.41 26.15 yuv to rgb24, pci output 3.72 12.24 13.52 21.62 14.85 18.59 29.98 yuv to rgb24a, pci output 4.34 14.52 16.04 25.02 17.58 21.63 35.01 yuv to rgb8a, sdram output 3.39 10.78 11.95 19.09 13.13 16.40 26.08 yuv to rgb15a, sdram output 3.46 11.04 12.26 19.60 13.46 16.82 26.87 yuv to rgb24, sdram output 3.62 11.69 13.06 20.88 14.43 18.03 28.71 yuv to rgb24a, sdram output 3. 90 12.69 14.11 22.57 15.65 19.56 31.07 yuv to rgb8a, bitmask, pci output 3.37 11.42 12.49 19.97 13.61 17.01 27.83 yuv to rgb8a, rgb 15a overlay, pci output 3.67 11.72 12.92 20.67 14.23 17.79 28.23 yuv to rgb8a, rgb 24a overlay, pc i output 4.23 13.57 15.32 24.51 16.93 21.15 33.15 yuv to rgb8a, yuv 422a overlay, pci output 3.67 11.72 12.92 20.67 14.23 17.79 28.23 yuv to rgb8a, 422 sequencing, pci out put 2.52 7.77 8.57 13.70 9.32 11.65 18.40 table 14-6. measured processing time in ms - sdram loaded 95%, priority delay = 1 w in pixels 360 640 720 720 800 800 1024 h in pixels 240 480 480 768 480 600 768 horizontal filter, 1 component 2.01 6.37 7.60 12.16 8.02 10.02 16.02 horizontal filter, 3 components yuv 4:2:2 4.11 13.69 15.62 24.96 16.56 20.68 32.65 vertical filter, 1 component 2.60 8.79 10.34 16.50 11.25 14.05 22.43 vertical filter, 3 components yuv 4: 2:2 5.20 17.59 20.66 32.96 23.15 28.89 44.87 yuv to rgb8a, pci output 3.51 11.08 12.17 19.46 13.51 16.88 26.56 yuv to rgb15a, pci output 3.52 11.11 12.22 19.51 13.47 16.82 26.65 yuv to rgb24, pci output 3.88 12.51 13.79 22.08 15.21 18.99 30.26
pnx1300/01/02/11 data book philips semiconductors 14-20 preliminary specification yuv to rgb24a, pci output 4.39 14.29 15.84 25.30 17.72 22.00 34.83 yuv to rgb8a, sdram output 3.69 11.67 12.75 20.39 14.20 17.80 27.95 yuv to rgb15a, sdram output 4. 25 13.15 14.64 23.41 16.79 20.98 31.49 yuv to rgb24, sdram output 5. 17 16.56 18.71 29.90 20.85 26.06 40.82 yuv to rgb24a, sdram output 5. 82 18.64 21.02 33.62 23.23 29.03 45.34 yuv to rgb8a, bitmask, pci out put 3.65 12.37 13.45 21.50 14.68 18.34 30.13 yuv to rgb8a, rgbl15a overlay, pci output 4.94 15.30 17.23 27.51 19.06 23.78 36.70 yuv to rgb8a, rgbl24a overlay, pci output 6.77 21.93 24.85 39.73 27.44 34.31 53.67 yuv to rgb8a, yuv422a overlay, pci output 4.95 15.30 17.22 27.51 19.06 23.80 36.70 yuv to rgb8a, 422sequencing, pci out put 3.04 8.92 9.63 15.39 10.53 13.16 20.37 table 14-6. measured processing time in ms - sdram loaded 95%, priority delay = 1 w in pixels 360 640 720 720 800 800 1024 h in pixels 240 480 480 768 480 600 768 table 14-7. measured processing time in ms , sdram loaded 95%, priority delay = 16 w in pixels 360 640 720 720 800 800 1024 h in pixels 240 480 480 768 480 600 768 horizontal filter, one component 7.70 24.28 29.32 46.90 30.05 37.56 60.39 horizontal filter, 3 components yuv 4: 2:2 15.28 52.00 60.08 96.10 63.13 78.90 123.29 vertical filter, one component 7. 50 26.71 30.92 49.31 33.57 41.93 68.18 vertical filter, 3 components yuv 4: 2:2 14.48 53.45 60.70 96.83 68.69 85.79 136.40 yuv to rgb8a, pci output 10.55 31.61 34.95 55.84 37.18 46.47 74.29 yuv to rgb15a, pci output 10.55 31.61 34.93 55.84 37.17 46.45 74.29 yuv to rgb24, pci output 10.39 31.71 34.93 55.84 37.25 46.54 73.58 yuv to rgb24a, pci output 10.49 31.95 35.06 55.98 37.15 46.46 74.10 yuv to rgb8a, sdram output 13. 83 41.93 48.10 76.94 51.57 64.42 99.33 yuv to rgb15a, sdram output 17. 58 55.55 60.95 97.49 65.82 82.24 137.71 yuv to rgb24, sdram output 20.25 65.46 74.67 119.44 81.74 102.12 158.43 yuv to rgb24a, sdram output 24. 05 78.51 88.98 142.21 98.69 125.67 196.99 yuv to rgb8a, bitmask, pci output 11.05 35.04 37.75 60.37 40.15 50.19 85.13 yuv to rgb8a, rgbl15a overlay, pci output 18.19 57.11 62.60 100.04 70.84 88.26 136.03 yuv to rgb8a, rgbl24a overlay, pci output 24.81 80.19 91.86 145.57 100.72 125.00 198.15 yuv to rgb8a, uv422a overlay, pci output 18.20 57.11 62.60 100.04 70.00 88.28 135.98 yuv to rgb8a, 422sequencing, pci out put 10.56 31.09 34.79 55.63 36.27 45.33 74.43
philips semiconductors image coprocessor preliminary speci fication 14-21 14.6.6 priority delay and icp minimum bus bandwidth the priority delay field in the status register sets the time the icp will wait for sdra m service before changing from a low-priority bus request to a high-priority request. the icp normally requests sdram bus service at the lowest-priority level, since it is a background processing device. in some cases, service to the icp could be con- tinuously delayed by other background devices, such as the vld processor or by high-priority requests from the dspcpu. the pd field sets a timer on the currently active bus re- quest. the timer is loaded with the pd value and started each time a bus request is su bmitted. the timer is incre- mented once each block time, the time required to load one block of 64 bytes. if the timer reaches 16 before the request is serviced, the icp changes its bus request pri- ority from low to high. the resulting time delay until the icp changes to high pri- ority is: timer delay = (16 - pd)*(block time) one block time is 16 clock cycles. table 14-8. line start and pixel time for linear model, no other load on sdram function t/linestart ( s) t/pixel (ns) horizontal filter, 1 component 1.1 11 horizontal filter, 3 components yuv 4:2:2 3.2 22 vertical filter, 1 component 0.2 29 vertical filter, 3 components yuv 4:2:2 0.7 58 yuv to rgb8a, pci output 3.2 30 yuv to rgb15a, pci output 3.3 30 yuv to rgb24, pci output 3.7 34 yuv to rgb24a, pci output 5.3 40 yuv to rgb8a, sdram output 3.4 30 yuv to rgb15a, sdram output 3.3 31 yuv to rgb24, sdram output 3.1 33 yuv to rgb24a, sdram output 3.4 36 yuv to rgb8a, bitmask, pci output 2.5 32 yuv to rgb8a, rgbl15a overlay, pci output 3.8 32 yuv to rgb8a, rgbl24a overlay, pci output 4.0 39 yuv to rgb8a, yuv422a overlay, pci out- put 3.8 32 yuv to rgb8a, 422sequencing, pci output 3.2 20 table 14-9. line start and pixel time for linear model, sdram loaded 95%, priority delay = 1 function t/linestart ( s) t/pixel (ns) horizontal filter, 1 component 0.9 20 horizontal filter,3 components yuv 4:2:2 2.8 40 vertical filter, 1 component 0.2 29 vertical filter, 3 components yuv 4:2:2 0.7 58 yuv to rgb8a, pci output 3.8 30 yuv to rgb15a, pci output 3.8 30 yuv to rgb24, pci output 4.5 34 yuv to rgb24a, pci output 6.0 39 yuv to rgb8a, sdram output 4.3 31 yuv to rgb15a, sdram output 4.9 36 yuv to rgb24, sdram output 4.6 47 yuv to rgb24a, sdram output 5.0 53 yuv to rgb8a, bitmask, pci output 3.2 34 yuv to rgb8a, rgbl15a overlay, pci output 5.5 42 yuv to rgb8a, rgbl24a overlay, pci output 5.8 63 yuv to rgb8a, yuv422a overlay, pci output 5.5 42 yuv to rgb8a, 422sequencing, pci output 4.9 21 table 14-10. line start a nd pixel time for linear model, sdram loaded 95%, priority delay = 16 function t/linestart ( s) t/pixel (ns) horizontal filter, 1 component 2.9 77 horizontal filter, 3 components yuv422 8.7 154 vertical filter, 1 component 0.4 87 vertical filter, 3 components yuv 4:2:2 1.2 174 yuv to rgb8a, pci output 13.9 82 yuv to rgb15a, pci output 13.8 82 yuv to rgb24, pci output 13.7 82 yuv to rgb24a, pci output 14.0 82 yuv to rgb8a, sdram output 15.8 115 yuv to rgb15a, sdram output 18.5 151 yuv to rgb24, sdram output 17.5 187 yuv to rgb24a, sdram output 16.6 233 yuv to rgb8a, bitmask, pci output 14.3 91 yuv to rgb8a, rgbl15a overlay, pci output 20.7 153 yuv to rgb8a, rgbl24a overlay, pci output 21.6 232 yuv to rgb8a, yuv422a overlay, pci out- put 20.8 153 yuv to rgb8a, 422sequencing, pci output 14.0 80
pnx1300/01/02/11 data book philips semiconductors 14-22 preliminary specification table 14-11 gives the delay in block times as a function of the pd field. the priority delay mechanism in interaction with the arbi- ter mechanism allows the user to allocate enough band- width for the icp to do its processing in the required frame time. for details of the arbiter mechanism see chapter 20, ?arbiter.? 14.6.7 icp parameter tables each microprogram in the microprogram set has an as- sociated parameter table used by the icp to process the image data, such as the image input and output start ad- dresses, scaling factor, etc. the dp points to the location in sdram of the first word of the parameter table. the parameter table address must be word aligned. the pa- rameter table can be more than one sdram block (16 32-bit words) long. note : in packed rgb24 to pci operation the output ad- dress offset from the start of video memory must be a multiple of 6 bytes, i.e. on an even pixel boundary. 14.6.8 load coefficients this routine loads the filter coefficient rams with coeffi- cient data in the parameter tabl e. a total of 32 sets of five 10-bit coefficients are loaded. each set of five coeffi- cients forms a 50-bit coeffi cient word. two coefficients are stored in each 32-bit word in sdram. three 32-bit words are used for each set of five coefficients that form a coefficient word. the para meter table is 96 words (6 sdram blocks) long. each coefficient is stored as the 10 lsbs of each 16-bit half word of the 32-bit word. the parameter table for the coefficient load function con- tains the coefficient data directly, as shown below. the parameter table is 96 words long. 14.6.9 horizontal filter - sdram to sdram this routine performs horizontal scaling and filtering of one component (y, u or v) of an n x m image from one location in sdram to another. 14.6.9.1 algorithms the routine reads image data from sdram using the y address counter, then scales and filters the data in the horizontal direction and writes it back to the sdram us- ing the z address counter. the 5-tap filter scales and fil- ters the data. the lsb increment value supplied by the parameter table determines the scaling. the routine reads and writes a line at a time until the full image is transferred. the filter mirrors the ends of each line to pro- vide the extra pixels needed by the filter at the ends of each line. 14.6.9.2 parameter table the parameter table, shown in table 14-13 , supplies the input and output starting addresses and offsets, the im- age height in lines and width in pixels, and the increment value, which is derived from the scale factor. the input and output address es are the byte addresses of their respective tables. th ey do not need to be word- or block-aligned. the input and output line offsets define the difference in bytes from the address of the first pixel in the first line to the address of the first pixel in the second line for their re- spective blocks. the line offset must be constant for all lines in each table. the line offset allows some space be- tween the end of one line and the start of the next line. it also allows the icp to scale a nd filter a subset of an ex- isting image, such as magni fying a portion of an image. there are no restrictions on line offset values other than they must be 16-bit, two?s complement integer values. (note that this allows negative offsets. you can use this to flip an image vertically.) the input and output image height and width values are the height in lines and width in pixels per line for their re- table 14-11. icp priority delay vs. pd code pd code delay block times 1111 1 1110 2 1101 3 1100 4 1011 5 1010 6 1001 7 1000 8 0111 9 0110 10 0101 11 0100 12 0011 13 0010 14 0001 15 0000 16 table 14-12. load coefficients parameter table parameter word description upper 2 bytes lower 2 bytes a+2 a+1 ram coefficient word 0 a+0 a-1 a-2 0 a+2 a+1 ram coefficient word 1 a+0 a-1 a-2 0 a+2 a+1 ram coefficient word 31 a+0 a-1 a-2 0
philips semiconductors image coprocessor preliminary speci fication 14-23 spective images. the height and width are 16-bit positive binary numbers between 0 and 64k-1. the integer increment and fraction increment values are the scaling parameters. the integer value is a 16-bit in- teger, and the fraction value is a positive binary fraction between 0 and 0.99999+. for up scaling (output image bigger), the increment value is the inverse of the scaling value. if you are upscaling by a factor of 2.5, the incre- ment value will be the inverse of 2.50 = 0.40. the integer increment value will be 0 and the fraction increment val- ue will be 0.40. for down sca ling, the increment value is equal to the scaling value. if you are down scaling by 2.5 (output image sma ller), the integer increment value will be 2, and the fraction incr ement value will be 0.500. to perform scaling, the integer and fractional increment values must be generated and placed in the parameter table. the simplest way to ge nerate these values in com- mon computer languages such as c is as follows: 1. generate the increment value as a floating point number = input width / output width 2. multiply the increment value by 65536 3. convert the result to a long integer (32 bits). the up- per 16 bits of the long integer will be the integer in- crement value, and the lower 16 bits will be the frac- tional value. 4. store the 32-bit long integer in the parameter table as the combined integer and fractional increment val- ues. the start fraction defines the starting value in the scal- ing counter for each line. it is a 16-bit, two?s complement fractional value between -0.500 and +0.49999. the start fraction allows the input data to be offset by up to half a pixel, referred to the input pixel grid. it is ?0? for y and for uv co-sited data, and set to ?-0.25? (c000h) for inter- spersed to co-sited conversion of u and v data. the ?- 0.25? value effectively shifts the u and v data toward the start of the line by 1/4 pixel, the amount required for con- version. 14.6.9.3 control word format the control word provides bit fields which affect the hor- izontal filtering operation. the format of the control word is as follows. bit name function 15 bypass bypass filter. picks nearest input pixel and passes it to output unfiltered. when bypass is set & scale factor is 1.0, this results in an image block move 9 getb large down-scaling bit. picks nearest input pixels and passes them to filter. equivalent to bypass + 5-tap filter of output pixels. lsb value = 0 for filter- ing. the bypass bit causes the data to bypass the 5-tap filter. the scaling operation selects the center pixel, and this pixel is passed to the filter out put. no filtering or interpo- lation is provided. if the scali ng factor is ?1.0?, the result is an image block move where the image is moved from one part of sdram to another without modification. if the scaling factor is other than ?1.0?, the effective algorithm is pixel picking, where the input pixel nearest the output pixel location is used as the output pixel. the getb bit is an optional bit for large (> 4) down scal- ing. when getb is ?0? (normal operation), the 5-tap filter receives the pixel nearest t he output pixel as its center pixel plus the two adjacent input pixels on either side of this pixel to form the five f ilter inputs. when getb is set, the filter receives the pixel n earest the output pixel as its center pixel plus the two pixels nearest the adjacent out- put pixels on either side of th is pixel to form the five filter inputs. the effective algorithm is pixel picking plus 5-tap filtering of the result. getb also forces the scaling lsb value to ?0?, since output pixels are being filtered and no table 14-13. horizontal filter parameter table parameter word description upper 2 bytes lower 2 bytes input image start address start address of x0y0 (byte address) y counter start fraction input image line offset starting value: may be 0.5, et c. for interspersed convert; line offset from x0y0 to x0y1 fraction increment integer increment incr ement value for y = 1/scale factor input image height input image width hei ght and width in input lines and pixels output image start address start address of x0y0 (byte address) control output image line offset control bits; line offset from x0y0 to x0y1 output image height output image width height and width in output lines and pixels
pnx1300/01/02/11 data book philips semiconductors 14-24 preliminary specification interpolation is used. (see section 14.5.2, ?filtering? ) this is shown in figure 14-18 . 14.6.10 vertical filter - sdram to sdram this routine performs vertical scaling and filtering of one component (y, u or v) of an n x m image from one loca- tion in sdram to another. 14.6.10.1 algorithms the routine reads image data from sdram using the y address counter, scales and filt ers the data in the vertical direction, and writes it back to the sdram using the z address counter. the 5-tap filt er scales and filters the da- ta. the u lsb register is used as the scaling coefficient register. the u lsb increment value supplied by the pa- rameter table determines the scaling. lines at the top and bottom of the image are mirrored to provide the extra line data needed by the 5-tap filter. the routine reads and writes data in 64-byte (one sdram block) columns of pixels until the entire image is transferred. for each column, line segments of 64 pixels are processed until the entire column has been pro- cessed. each 64-pixel line segment generated requires five vertically adjacent 64-pixel line segments as input to the 5-tap filter. the routine processes the image in pixel columns to eliminate redundant read of input pixel data: each new line segment typically requires reading only one new 64 byte line segment. the routine processes data in 64-pixel blocks, corre- sponding to the input block buffer sizes. five buffers are used in processing the current line segment, while the sixth buffer reads in the next line segment in overlap with current processing. 14.6.10.2 parameter table the parameter table, as shown in figure 14-19 , supplies the input and output starting addresses and offsets, the image height in lines and width in pixels, and the scale factor. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1920 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 p2n = f(10, 11, 12, 13, 14) p2l = f(2, 7, 12, 17, 22) 21 22 23 24 25 normal down scaling large down scaling input pixels output pixels input pixels output pixels figure 14-18. normal vs. large down scaling for scale factor = 5.0 figure 14-19. vertical filter parameter table parameter word description upper 2 bytes lower 2 bytes input image start address start address of x0y0 (byte address) u counter start fraction input image line offset starting value: may be 0.5, et c. for interspersed convert; line offset from x0y0 to x0y1 fraction increment integer increment in crement value for u = 1/scale factor input image height input image width height and width in input lines and pixels output image start address start address of x0y0 (byte address) control output image line offset control word; line offset from x0y0 to x0y1 output image height output image width hei ght and width in output lines and pixels
philips semiconductors image coprocessor preliminary speci fication 14-25 the input and output address es are the byte addresses of their respective tables. the input and the output ad- dress need to be 64-byte aligned. the input and output line offsets define the difference in bytes from the address of the first pixel in the first line to the address of the first pixel in the second line for their re- spective blocks. the line offset must be constant for all lines in each table. it allows some space between the end of one line and the start of the next line. it also allows the icp to scale and filter a subset of an existing image, such as magnifying a portion of an image. offset values are 16-bit, two?s complement integer values. vertical filtering has a restriction on input and output line offset values: they must be positive, and they must be multiples of 64. note that th is only applies to the line-to- line spacing. even with this re striction, input images may be any height and any width and may start at any byte address. also, image subsets of arbitrary height and width can be used. as long as the original image has a line offset which is a multiple of 64, all subsets of that im- age will also automatically have a line offset, which is a multiple of 64 - the same as the original im age. all imag- es should have line offsets which are multiples of 64 as good programming practice, even though this restriction only applies to vertical filtering. if an image does not have a multiple of 64 line offset, it can be converted to that by using horizontal filtering in the image block move mode with the output offset value being a multiple of 64. the input and output image height and width values are the height in lines and width in pixels per line for their re- spective images. the height and width are 16-bit positive binary numbers between 0 and 64k-1. the integer increment and fraction increment values are the scaling parameters. the integer value is a 16-bit in- teger, and the fraction value is a positive binary fraction between 0 and 0.99999+. for up scaling (output image bigger), the increment value is the inverse of the scaling value. if you are upscaling by a factor of 2.5, the incre- ment value will be the inverse of 2.50 = 0.40. the integer increment value will be 0 and the fraction increment val- ue will be 0.40. for down sca ling, the increment value is equal to the scaling value. if you are down scaling by 2.5 (output image sma ller), the integer increment value will be 2, and the fraction incr ement value will be 0.500. to perform scaling, the integer and fractional increment values must be generated and placed in the parameter table. the simplest way to ge nerate these values in com- mon computer languages such as c is as follows: 1. generate the increment value as a floating point number = input heig ht / output height 2. multiply the increment value by 65536 3. convert the result to a long integer (32 bits). the up- per 16 bits of the long integer will be the integer in- crement value, and the lower 16 bits will be the frac- tional value. 4. store the 32-bit long integer in the parameter table as the combined integer and fractional increment val- ues. the start fraction defines the starting value in the scal- ing counter for each line. it is a 16-bit, two?s complement fractional value between -0.500 and 0.49999+. this val- ue is placed in the start fraction allows the input data to be offset by up to half a line, referred to the input pixel grid. it is set to ?0? for a ll conventional yuv input data. 14.6.10.3 control word format the control word provides bit fields which affect the ver- tical filtering operation. the format of the control word is as follows. bit name function 15 bypass bypass filter. picks nearest input line and passes it to output unfiltered. when bypass is set & scale factor is 1.0, this results in an image block move the bypass bit causes the data to bypass the 5-tap filter. the scaling operation selects the center line, and this line is passed to the filter output. no filtering or interpola- tion is provided. if the scaling fact or is 1.0, the result is an image block move where the image is moved from one part of sdram to another wit hout modification. if the scaling factor is other than 1.0, the effective algorithm is line picking, where the input line nearest the output line location is used as the output line. 14.6.11 horizontal filter with rgb/yuv conversion to pci or sdram this routine moves an n x m image in yuv 4:2:2, yuv 4:2:0 or yuv 4:1:1 format fr om sdram to the pci bus or to sdram. the image is scaled and filtered in the hori- zontal direction during the move. optional bit masking and/or rgb overlay can be used during the move when pci output is specified. 14.6.11.1 algorithms the routine reads image data from sdram using the y, u, and v address counters, scales and filters the data in the horizontal direction and writes it to the pci interface or sdram. the 5-tap filter scales and filters the data. the lsb increment value for each of the y, u and v com- ponents supplied by the parameter table determines the scaling. separate scaling fact ors allows yuv 4:2:2 inter- spersed to co-sited transform ation as the data is being filtered. the scaled and filter ed data is converted to rgb or yuv format before being sent to the pci interface or to sdram. in the pci output case, overlay data with al- pha blending and chroma keying can be added to the output image, and the output image can be gated by a bit mask before it is sent to the pci interface. the routine reads and writes a line at a time until the full image is transferred. the f ilter mirrors the ends of each line to provide the extra pixels needed by the filter at the ends of each line.
pnx1300/01/02/11 data book philips semiconductors 14-26 preliminary specification 14.6.11.2 parameter table the parameter table, shown in table 14-14 , supplies the input and output starting addresses and offsets for y, u, v, ol, b and z, the image height in lines and width in pix- els, and the scale factors for each component. the input and output address es are the byte addresses of their respective tables. they do not need to be word or block aligned. note the following restriction: in packed rgb24 to pci operation the output address offset from the start of video memory mu st be a multiple of 6 bytes, i.e. on an even pixel boundary. the input and output line offs ets define the difference in bytes from the address of the first pixel in the first line to the address of the first pixel in the second line for their re- spective blocks. the line offset must be constant for all lines in each table. the line offset allows some space be- tween the end of one line and the start of the next line. it also allows the icp to scale a nd filter a subset of an ex- isting image, such as magni fying a portion of an image. there are no restrictions on line offset values other than they must be 16-bit, two?s complement integer values. (note that this allows negative offsets. you can use this to flip an image vertically.) the input and output image height and width values are the height in lines and width in pixels per line for their re- spective images. the height and width are 16-bit positive binary numbers between 0 and 64k-1. the integer increment and fraction increment values are the scaling parameters. there is a separate scaling pa- rameter for each of the y, u and v input components. the integer value is a 16-bit integer, and the fraction val- ue is a positive binary fraction between 0 and 0.99999+. for up scaling (output image bigger), the increment val- ue is the inverse of the scaling value. if upscaling by a factor of 2.5, the incremen t value will be the inverse of 2.50 = 0.40. the integer in crement value will be ?0? and the fraction increment va lue will be ?0.40?. for down scaling, the increment value is equal to the scaling value. if you are down scaling by 2.5 (output image smaller), the integer increment value will be ?2?, and the fraction incre- ment value will be ?0.500?. to perform scaling, the integer and fractional increment values must be generated and placed in the parameter table 14-14. horizontal filter to rgb output parameter table parameter word description upper 2 bytes lower 2 bytes input image y start address y start address of x0y0 (byte address) y counter start fraction input image y line offset starting value: may be 0.5, et c. for interspersed convert; y line offset from x0y0 to x0y1 y fraction increment y integer increment in crement value for u = 1/scale factor y input image height y input image width y height and width in pixels input image u start address u start address of x0y0 (byte address) u counter start fraction input image u line offset starting value: may be 0.5, et c. for interspersed convert; u line offset from x0y0 to x0y1 u fraction increment u integer increment in crement value for y = 1/scale factor u input image height u input image width u height and width in pixels input image v start address v start address of x0y0 (byte address) v counter start fraction input image v line offset starting value: may be 0.5, et c. for interspersed convert; v line offset from x0y0 to x0y1 v fraction increment v integer increment in crement value for v = 1/scale factor v input image height v input image width v height and width in pixels output image start address start address of x0y0 (byte address) control output image line offset input & output formats & control bits; line offset from x0y0 to x0y1 output image height output image widt h height and width in output pixels bit map image start address start address of x0y0 (byte address) 0 bit map image line offset line offset from x0y0 to x0y1 rgb overlay start address start address of x0y0 (byte address) alpha 1 & alpha 0 overlay line offset alpha 1 & alpha 0 blend code for rgb15+ , etc.; line offset from x0y0 to x0y1 overlay end pixel overlay start pi xel start and end pixels along line overlay end line overlay start line start and end lines in frame
philips semiconductors image coprocessor preliminary speci fication 14-27 table. the simplest way to ge nerate these values in com- mon computer languages such as c is as follows: 1. generate the increment value as a floating point number = input width / output width 2. multiply the increment value by 65536 3. convert the result to a long integer (32 bits). the up- per 16 bits of the long integer will be the integer in- crement value, and the lower 16 bits will be the frac- tional value 4. store the 32-bit long integer in the parameter table as the combined integer and fractional increment values for yuv 4:2:2 or yuv 4:2: 0 input data and rgb output data, the scaling factor for u and v must be twice the scaling factor for y, unless yuv4:2:2 sequencing is used for speed. in yuv 4:2:2 or yuv 4:2:0 data, the horizontal components of u and v are half those of y. the u and v must be upscaled by 2 to generate a yuv 4:4:4 format internally for yuv to rgb co nversion. for yuv 4:1:1 in- put data, the u and v components must be upscaled by a factor of 4 to generate the required internal yuv 4:4:4 format. the start fraction defines the starting value in the scal- ing counter for each line. it is a 16-bit, two?s complement fractional value between -0.500 and 0.49999+. the start fraction allows the input data to be offset by up to half a pixel, referred to the input pixel grid. it is ?0? for y and for uv co-sited data, and is set to ?-0.25? (c000) for inter- spersed to co-sited conver sion of u and v data. the ?- 0.25? value effectively shifts the u and v data toward the start of the line by 1/4 pixel, the amount required for con- version. the alpha 1 and alpha 0 values are 8-bit fields within the 16-bit alpha field. these values are loaded into the alpha 1 and alpha 0 registers, resp., for use by rgb 15+ and yuv 4:2:2+ overlay formats in alpha blending. the overlay start and end pixels and lines define the start and end pixels and lines within the output image for the overlay. the first pixe l of the overlay image will be blended with the pixel at the overlay start pixel and overlay start line in the output image. 14.6.11.3 control word format the control word provides bit fields which affect the hor- izontal filtering operation. t he format of the control word is as follows. bits name function 15 bypass normally set to 0 to enable filtering. can be set to 1 to accomplish data move without filtering. 14 422seq 4:2:2 sequence bit. used with yuv 4:2:2 output 13 yuv420 yuv 4:2:0 input format 12 oen overlay enable. valid only for pci out- put 11 pci pci output enable. otherwise sdram output 10 ben bit mask enable. valid only for pci output 9 getb large down scaling bit. picks five input pixels nearest 5 output pixels and passes to filter. equivalent to filter bypass + 5-tap filter of output pixels. lsb value = 0 for fil- tering. 8 olle overlay little endian enable 7-6 ofrm overlay format 0 = rgb 24+ 1 = rgb 15+ 2 = yuv 4:2:2+ 5 chk chroma keying enable 4 le rgb output little endian enable 3-0 rgb rgb output code 0 = yuv 4:2:2+ 1 = yuv 4:2:2 2 = rgb 24+ 3 = rgb 24 packed 4 = rgb 8a (rgb 233) 5 = rgb 8r (rgb 332) 6 = rgb15+ 7 = rgb 16 the 422seq bit controls the internal sequencing of the yuv to rgb operation. it is set to ?1? when yuv 4:2:2 output is selected. when 422seq is ?0?, normal rgb out- put is assumed. in this mode, the input is yuv 4:2:2 or yuv 4:2:0, and the output is rgb. to generate the rgb output, the yuv 4:2:2 or yu v 4:2:0 input must be up- scaled to yuv 4:4:4 before conversion to rgb. this means the scaling factor for u and v must be twice the scaling factor for y. the internal sequencing of the filter in this case is uvy, uvy, uvy to generate rgb, rgb, rgb. for yuv 4:2:2 output formats, no upscaling of u and v is required. in this case, the 422seq bit is set to one, and the filt er sequence is uvyy, uvyy, uvyy. the 422seq bit can be set in rgb output mode to de- crease the processing time for the image at the expense of color bandwidth and some corresponding decrease in picture quality. if the 422seq bit is set for rgb output, the filter will perform the u vyy sequence. in this case, the u and v components are not upscaled by 2, and the yuv to rgb converter updat es its u and v components every other pixel. in the normal case (422seq=0), it takes 6 clock cycles to generate two rgb pixels. in the 422seq=1 case, it takes 4 clock cycles to generate two rgb pixels, reducing processing time by 33%. the yuv420 bit indicates that the input data is in yuv 4:2:0 format. in yuv 4:2:0 format, the u and v compo- nents are half the width and half the height of the y data. yuv 4:2:0 data is normally converted to yuv 4:2:2 data by a separate vertical upscaling by a factor of 2.0 for best quality. the yuv420 bit allo ws using yuv 4:2:0 data di- rectly but with some quality degradation. when yuv420 is set, the icp up scales the data vertically by line dupli- cation. each u and v input line is used twice. the sepa-
pnx1300/01/02/11 data book philips semiconductors 14-28 preliminary specification rate vertical scaling step is eliminated at the expense of some quality since the lines are simply duplicated rather than being fully scaled and filtered. the oen bit enables overlay. se t it to ?1? if an overlay is used, ?0? if not. overlays are only valid for pci output. the pci bit selects pci as the output port for the icp da- ta. a ?1? selects pci output; a ?0? selects sdram output. the ben bit enables bit masking. set it to ?1? if bit mask- ing is used, ?0? if not. bit masking is only valid for pci out- put. the getb bit is an optional bit for large (> 4) down scal- ing. when getb is ?0? (normal operation), the 5-tap filter receives the pixel nearest th e output pixel as its center pixel plus the two adjacent input pixels on either side of this pixel to form the five f ilter inputs. when getb is set, the filter receives the pixel ne arest the output pixel as its center pixel plus the two adjacent output pixels on either side of this pixel to form the five filter inputs. the effective algorithm is pixel picking plus 5-tap filtering of the result. getb also forces the scaling lsb value to ?0?, since out- put pixels are being filtered and no interpolation is used. the ofrm bit field selects the overlay data format, as shown in the contro l word format list. the chk bit enables chroma keyi ng. set it to ?1? if chro- ma keying is used, ?0? if not. the olle bit sets the endian-ness of the overlay data in- put. set it to ?1? if the overla y data is little-endian, ?0? if big endian. this bit is normally set to the same value as the le bit in the status register. the le bit sets the endian-ness of the rgb/yuv output data. set it to ?1? if the output data is little-endian, ?0? if big endian. the le bit is normally set to the same value as the le bit in the status register. the rgb field defines the outp ut data format, as shown in the control word format list. important note: the icp dma enable bit (ie) in the biu_ctl register of the pci interface must be set for rgb output to pci. this bit must be set before initiating rgb to pci operations, or the icp will stall waiting for the pci to become ready.
preliminary specification 15-1 variable length decoder chapter 15 by gene pinkston and selliah rathnam 15.1 vld overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the variable length decoder (vld) unit huffman-de- codes mpeg-1 and mpeg-2 (main profile) video bit- streams[1-3]. this chapter describes a programmers view of the vld. the vld reads an mpeg stream from sdram, decodes the bitstream under the cont rol of dspcpu and outputs two data streams. the output data streams contain mac- roblock header information and the run-length encoded dct coefficients. the output data streams are stored in the sdram buffers. the vld unit, operates independently during the slice decoding process. the remaining decoding of the mpeg stream is carried out by the dspcpu. 15.2 vld operation enabled by the dspcpu, the vld unit can be initialized by hardware or software reset operations. hardware re- set is provided by the exte rnal tri_reset# pin. soft- ware reset is provided by one of the vld commands. the dspcpu controls the vld through the vld com- mand register. there are five commands supported by the vld: ? shift the bitstream by some number of bits (a maxi- mum of 15-bit shift) ? search for the next start code ? reset the vld ? parse some number of macroblocks ? flush vld output buffers to sdram the normal mode of operation will be for the dspcpu to request that the vld to parse some number of macrob- locks. once the vld has begun parsing macroblocks, it may stop for any one of the following reasons: hwy_bus rd buffer macroblock dma engine control status status mmio & conf regs shifter start_code_ detector mb_addr mb_type cbp dmv & motion dct_lum dct_chr dctcoef (0) dctcoef (1) escape_codes vld flow control interrupt run-level hdr wr fifo wr fifo figure 15-1. vld block diagram 64 bytes 64 bytes 64 bytes
pnx1300/01/02/11 data book philips semiconductors 15-2 preliminary specification ? the command was completed with no exceptions ? a start code was detected ? an error was encountered in the bitstream ? the vld input dma completed, and the vld is stalled waiting for more data ? one of the vld output dmas has completed and the vld is stalled because th e output fifo is full the dspcpu can be interrupted whenever the vld halts. consider the case in whic h the vld has encountered a start code. at this point, the vld will halt and set the sta- tus flag to indicate that a start code has been detected. this event will generate an in terrupt to the dspcpu (if corresponding interrupt is enabled). upon entering the interrupt routine, the dspc pu will read the vld status register to determine the source of the interrupt. once it has determined that a start code was encountered, the cpu will read 8 bits from the vld shift register to deter- mine the type of start code en countered. if it is a ?slice? start code, the dspcpu reads from the shift register the slice quantization scale and any extra slice information. the slice quantization scale is then written back to the vld quantizer-scale register. before exiting the interrupt routine, the dspcpu will cl ear the start code detected status bit in the status re gister and issue a new command to process the remaining macroblocks. 15.3 decoding up to a slice mpeg decoding up to the slic e layer is carried out by the dspcpu and the vld. the vld is controlled by the dspcpu for the decoding of all parameters up to the slice-start code. during this process, the dspcpu reads from the vld_sr register which contains the next 16 bits of the bitstream. the dspc pu also issues shift com- mands to the vld in order to advance the contents of the shift register by the spec ified number of bits. the dspcpu may also command the vld to advance to the next start code. refer to table 15-6 for a complete list of vld commands and their functions. once at the slice layer, the vld operates inde pendently for the entire slice decoding. the slice decodi ng starts once the dspcpu issues a parse command. 15.4 vld input input to the vld is controll ed by the vld input dma en- gine. the input dma engine is programmed by the dspcpu to read from sdram. the dspcpu programs this dma engine by writing the address and the length of the sdram buffer containing the mpeg stream. the ad- dress of the buffer is written to the vld_bit_adr regis- ter. the length, in bytes, of the buffer is written to the vld_bit_cnt register. esc count mba inc mb type mot type dct type mv count mv format dmv mv field sel [0][0] motion code [0][0][1] motion residual [0][0][0] motion residual [0][0][1] motion code [0][0][0] mv field sel [1][0] motion code [1][0][1] motion residual [1][0][0] motion residual [1][0][1] motion code [1][0][0] mv field sel [0][1] motion code [0][1][1] motion residual [0][1][0] motion residual [0][1][1] motion code [0][1][0] mv field sel [1][1] motion code [1][1][1] motion residual [1][1][0] motion residual [1][1][1] motion code [1][1][0] quant scale cbp dmvector[0] dmvector[1] 31 first forward motion vector second forward motion vector (for mpeg2 only) first backward motion vector second backward motion vector (for mpeg2 only) 0 1 2 3 4 6 11 17 25 7 15 23 29 30 31 13 7 15 23 29 30 31 13 7 15 23 29 30 31 13 7 15 23 29 30 31 13 4 10 12 14 31 figure 15-2. mpeg-2 macroblock header output format w1 w2 w3 w4 w5 w0 mb1 mb2
philips semiconductors variable length decoder preliminary specification 15-3 the vld reads data from sdram into an internal 64- byte fifo. the vld processing engine then reads data from the fifo as needed. once this internal fifo is empty the vld reads more data from sdram. the vld_bit_adr and vld_bit_cnt registers are updat- ed after each read from main memory. the content of the vld_bit_adr register reflects the next address from which the bitstream data will be fetched. the content of the vld_bit_cnt register reflects the number of bytes remaining to be read before the current transfer is com- plete. when the number of bytes remaining to be read from sdram is zero, a status flag is set and an interrupt can be generated to the dspcpu. the dspcpu will provide the new bitstream buffer address and the num- ber of bytes in the bitstream buffer to the vld. 15.5 vld output the vld outputs two data streams which are written back to main memory by two output dma engines. these dma engines are programmed by the dspcpu. one of the output streams contains macroblock header information and the other c ontains run-length encoded dct coefficients. each dma engine contains a 64-byte fifo which is transferred to main memory once it is full. the main memory address and count for the macroblock header output are contained in the vld_mbh_adr and vld_mbh_cnt registers res pectively. the main mem- ory address and count for th e dct coefficient output are contained in the vld_rl_adr and vld_rl_cnt reg- isters respectively. the counts for both the macroblock header and coefficient data are expressed in terms of 32- bit (4 bytes) words. 15.5.1 macroblock header output data for each mpeg-2 macrobloc k parsed by the vld, six 32-bit words of macroblock header information will be output from the vld. figure 15-2 pictures the layout of the vld output, the fields are described in table 15-1 . note that these fields may or may not be valid depending upon the mpeg-2 video standard[2]. for example, mo- tion vectors are not valid for intra coded macroblocks. similarly, ?dct type? is no t valid for field pictures. for each mpeg-1 macroblock parsed by the vld, four 32-bit words of macroblo ck header information will be output from the vld. figure 15-3 pictures the layout of the vld output, while the fields are described in table 15-2 . note that these fields may or may not be val- id depending upon the mpeg-1 video standard[1]. table 15-1. references for the mpeg-2 macroblock header data item default value references from mpeg-2 video standard, is 13818-2 document esc count 0 section 6.2.5 mba inc - section 6.2.5 and table b-1 mb type unde- fined section 6.2.5.1 and tables b- 2, b-3, and b-4; only 5 msb bits from the tables are used mot type unde- fined section 6.2.5.1; field or frame motion type will be decided by the user dct type unde- fined section 6.2.5.1 mv count unde- fined tables 6-17 and 6-18. the mv count value is one less than the value from the tables. mv format unde- fined tables 6-17 and 6-18 dmv unde- fined tables 6-17 and 6-17 mv field sel[0]0] to mv field sel[1][1] unde- fined section 6.2.5 and 6.2.5.2 motion code[0][0][0] to motion code[1][1][1] unde- fined section 6.2.5.2.1 and table b-10 motion resid- ual[0][0][0] to motion resid- ual[1][1][1] unde- fined section 6.2.5.2. 1; the corre- sponding rsize bits are extracted from the bitstream and stored as left justified; to get the final value shift the given number by 8 (corre- sponding rsize). the rsize val- ues are stored in vld_pi register dmvector[1] and dmvector[0] unde- fined section 6.2.5.2.1 and table b- 11; signed 2-bit integer from table b11. cbp - section 6.2.5, 6.2.5.3 and table b-9 quant scale - section 6.2.5; 5-bit from bit- stream and use table 7-6 to compute the quant scale value. table 15-2. references for the mpeg-1 macroblock header data item default value references from is 11172-2 document esc count 0 section 2.4.3.6 mba inc - section 2.4.3.6 mb type unde- fined section 2.4.3.6 and tables b- 2a to b2d motion code[0][0][0] to motion code[0][1][1] unde- fined section 2.4.2.7 and table b-4 motion resid- ual[0][0][0] to motion resid- ual[0][1][1] unde- fined section 2.4.2.7;the corre- sponding rsize bits are extracted from the bitstream and stored as left justified; to get the final value shift the given number by (8 - corre- sponding rsize). the rsize val- ues are stored in vld_pi register. cbp - section 2.4.3.6 and table b-3 quant scale - section 2.4.2.7
pnx1300/01/02/11 data book philips semiconductors 15-4 preliminary specification 15.5.2 run-level output data the dct coefficients associated with the macroblock are output to a separate memory area and each dct coeffi- cient is represented as one 32 -bit quantity (16 bits of run and 16 bits of level). for intra blocks, the dc term is ex- pressed as 16 bits of dc size and a 16-bit value whose most significant bits (the number of bits used for dc level is determined by dc size) represent the dc level. each block of dct coefficients is terminated by a run value of ?0xff?. 15.6 vld time sharing the pnx1300 vld is targeted for a single bitstream de- code and there is no provision to decode more than one bitstream at a time by using the vld in time multiplexed mode. however internal development has shown that up to 4 simultaneous mpeg1 bitstreams can be decoded. this procedure is beyond the scope of this databook but can be discussed further by contacting customer sup- port. 15.7 mmio registers to ensure compatib ility with future devices, any unde- fined mmio bits should be ignored when read, and writ- ten as ?0?s. 15.7.1 vld status (vld_status) this register contains the cu rrent status information most pertinent to the normal operation of an mpeg video de- code application. vld status description is detailed in table 15-3 and pictured in figure 15-4 . default value (af- ter hardware re set) is ?0?. interrupts can be enabled for any of the defined status bits (see following vld_im ask description). acknowl- edgment of the interrupt is done by writing a ?1? to the cor- responding bit in vld_status register. writing a one to the bits one through five clears the corresponding bits. however bit 0 (command_done) is cleared only by is- suing a new command. writing a ?0? to bit 0 of the status register will result in undefined behavior of the vld. note that several status bits may be asserted simultaneously. thus it is recommended to use level triggered interrupts (see section 3.5.3.6 on page 3-11 ) and carefully ac- knowledge the interrupt. 15.7.2 vld interrupt enable (vld_imask) this register allows the dspc pu to control the initiation of the interrupt for the corresponding bits in the vld sta- tus register. writing a ?1 ? into any of the defined vld_imask bits enables the interrupt for the corre- sponding bit in the status register (vld_status). de- fault value (after hard ware reset) is ?0?. esc count mba inc mb type motion code [0][0][1] motion residual [0][0][0] motion residual [0][0][1] motion code [0][0][0] motion code [0][1][1] motion residual [0][1][0] motion residual [0][1][1] motion code [0][1][0] quant scale cbp 31 first forward motion vector first backward motion vector 0 1 2 3 4 6 11 17 25 7 15 23 29 30 31 13 7 15 23 29 30 31 13 4 10 12 14 31 figure 15-3. mpeg1 macroblock header output format w1 w2 w3 w0 mb1 mb2
philips semiconductors variable length decoder preliminary specification 15-5 15.7.3 vld control (vld_ctl) the vld_ctl register has one bit indicating the endian- ness of the vld unit. little-e ndian = ?1?, big-endian = ?0?. default value (after hard ware reset) is ?0?. 15.8 vld dma registers there are one input dma engine and two output dma engines in the vld block. each of the three dma en- gines (or channels) for the vld is controlled by two mmio registers. the address register always contains the address of the next sdram transaction. the count register always indicates the amount of data to be trans- ferred to or from main memory. a dma completes when its count reaches zero. once a dma count register be- comes zero, a bit is set in the status register and the dspcpu can be interrupted. the dspcpu sets a non- zero value to a dma count regi ster to initiate a new dma transaction. the input count register always contains number of bytes to be fetc hed from the main memory. the output count registers always contain the number of words (4 bytes) to be wri tten to the main memory. note that both of the dma output engines write only to 64-byte aligned addresses and they always write 64 bytes. when flushing the dma output fifos there may not be 64 bytes of valid data at the time the flush com- mand is received. in this case , 64 bytes are still written to the main memory. the valid bytes can be determined from the count register value before issuing the flush command. the valid data always resides in the first n bytes while the last 64-n by tes will contain random data and should be ignored. 15.8.1 dma input the bitstream input to the vld is controlled by vld_bit_adr and vld_bi t_cnt mmio registers. vld_bit_adr contains the main memory address for the next read from the main memory to the vld input fifo. vld_bit_cnt register contains the number of bytes remaining to be read before the current dma is completed. the vld input address is byte aligned. 15.8.2 macroblock header output dma the macroblock header output of the vld is controlled by vld_mbh_adr and vld_mbh_cnt registers. vld_mbh_adr contains the address of the next write of macroblock header data to the main memory. vld_mbh_cnt contains the remaining number of words (4 bytes) to write before the current dma expires. the macroblock header output address is 64-byte aligned. 15.8.3 run-level output dma the run-level output of the vld is controlled by vld_rl_adr and vld_rl_cnt. vld_rl_adr con- tains the address of the next write of macroblock header data to the main memory. vld_rl_cnt contains the number of 4-byte writes remaining before the current dma expires. the run-level buffer address is 64-byte aligned. table 15-3. vld_status register name size (bits) description command_done 1 indicates successful completion of current command startcode 1 vld encountered 0x000001 while executing parse or next start code command error 1 vld encountered an illegal huffman code or an unexpected start code dma_in_done 1 dma transfer of given sdram buffer has completed and vld is stalled waiting on more main memory input data; dspcpu is responsible to provide the new sdram buffer to vld mbh_out_done 1 macroblock header dma trans- fer has completed rl_out_done 1 run-level dma transfer com- plete table 15-4. vld control (r/w) name size (bits) description reserved 1 little endian 1 forces vld to operate in little endian mode when set to 1.
pnx1300/01/02/11 data book philips semiconductors 15-6 preliminary specification figure 15-4. vld mmio registers layout. 31 0 3 7 11 15 19 23 27 mmio_base offset: vld_command (r/w) 0x10 2800 vld_status (r) 0x10 2810 rl_out_done mbh_out_done dma_in_done error startcode command_done vld_ctl (r/w) 0x10 2818 command count 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 vld_sr (r) 0x10 2804 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 31 0 3 7 11 15 19 23 27 value vld_qs (r/w) 0x10 2808 vld_pi (r/w) 0x10 280c qs vbrs hbrs vfrs hfrs mpeg2 conceal_mv intra_vlc fpfd pict_struc pict_type vld_rl_cnt (r/w) 0x10 2830 31 0 3 7 11 15 19 23 27 vld_bit_adr (r/w) 0x10 281c vld_bit_cnt (r/w) 0x10 2820 31 0 3 7 11 15 19 23 27 vld_mbh_adr (r/w) 0x10 2824 31 0 3 7 11 15 19 23 27 vld_mbh_cnt (r/w) 0x10 2828 31 0 3 7 11 15 19 23 27 vld_rl_adr (r/w) 0x10 282c 31 0 3 7 11 15 19 23 27 little_endian bit_adr mbh_adr rl_adr bit_cnt rl_cnt mbh_cnt vld_imask (r/w) 0x10 2814 int. enables 0 0 0 0 0 0 0 0 0 0 0 0
philips semiconductors variable length decoder preliminary specification 15-7 15.9 vld operational registers 15.9.1 vld command (vld_command) this register indicates the next action to be taken by the vld. some commands have an associated count which resides in the least significant 8 bits of this register. there are currently five commands recognized by the vld block: ? shift the bitstream by ?count? bits (?count? must be less than or equal to 15) ? parse ?count? un-skipped macroblocks ? search for the next start code ? reset the vld ? flush the vld output buffers the dspcpu must wait for the vld to halt before the next command can be issued. note that there are sever- al ways in which a command may complete. only a suc- cessful completion is indicated by the command_done bit in the st atus register. a command may complete unsuccessfully if a start code or an error is encountered before the requested number of items has been processed. note also that expiration of a dma count does not constitute completion of a command. when a dma count expires the vld is stalled as it waits for a new dma to be initiated. it is not halted. default val- ue (after hardware reset) is ?0?. vld_command fields are described in table 15-5 and the different commands explained in table 15-6 . 15.9.2 vld shift register (vld_sr) this read only register is a shadow of the vld?s opera- tional shift register. tt allows the dspcpu to access the bitstream through the vld. bi ts 0 through 15 are the cur- rent contents of the vld shift register. bits 16 to 31 are reserved and should be treated as undefined by the programmer. 15.9.3 vld quantizer scale (vld_qs) this 5-bit register contains the quantization scale code (from the slice header) to be output by the vld until it is overridden by a macroblock quantizer scale code. the quantizer scale code is part of the macroblock header output. table 15-5. vld command register name size (bits) description count 8 count for current command command 4 vld command to be exe- cuted table 15-6. vld commands command field coding flags set after completion of the command description shift the bitstream by ?count? bits 1 command_done or dma_in_done vld shifts the number of bits in its inter nal shift register. the shift register value is available in the vld_sr register. the dma_in_done flag will be set when vl d runs out of data from input fifo. the flag is reset by issuing the new command. search for the next start code 3 startcode or command_done or dma_in_done vld search for a start code. the search code has 0x000001 prefix and an addi- tional 8-bit value. the dma_in_done flag will be set when vl d runs out of data from input fifo. the startcode detected flag is reset by writing a ?1? value to the flag. the command_done flag is reset by issuing the new command. reset the vld 4 none refer section 15.12 for more details parse for a given number of mac- roblocks 2 command_done or startcode or error or dma_in_done vld parses for a given number of un-skipped macroblocks and the associated run-level values. count will indicate t he remaining macroblocks to parse. note that this number is slightly inaccurate since a parsed macroblock can still be in internal 64-byte fifo. if vld encounters a start code, the pa rsing action will be terminated and vld sets only the startcode detected flag. if vld parses the given number of un- skipped macroblocks without encountering a start code, vld will set the command_done flag. the error flag will be set when vld encounters an error while parsing the bit- stream. the dma_in_done flag will be set when vl d runs out of data from input fifo. the startcode detected flag is reset by writing a ?1? value to the flag. the command_done flag is reset by issuing the new command. flush the vld out- put buffer 8 command_done vld flushes the remaining macr oblock header data and the remaining run-level data to sdram. the highway byte-enables will be used in order to write only the valid data to sdram. only the valid word count values written to sdram will be subtracted from the vld_mbh_cnt and the vld_rl_cnt registers.
pnx1300/01/02/11 data book philips semiconductors 15-8 preliminary specification 15.9.4 vld picture info (vld_pi) this 32-bit register contains the picture layer information necessary for the vld to parse the macroblocks within that picture. again, the values for each of these fields are determined by the appropriate standard (mpeg [1-3]). 15.10 error handling upon encountering a bitstrea m error, the vld will set the bitstream-error flag (error) in the vld_status reg- ister and interrupt the dspcpu, if the interrupt is en- abled. note that if a start c ode is present (in the vld shift register) when an error is detected, then both the start code and the error bits will be set. a separa te flush com- mand is required to flush any valid data in the run-level and macroblock header output buffers. the dspcpu de-asserts the error flags by writing a ?1? to the error flag. 15.11 interrupt the interrupt source number for the vld is 14 and it should be set in level sensitive mode (see section 3.5.3.6 on page 3-11 ). 15.12 reset the vld block is reset by a hardware reset or a software reset. the hardware reset signal is generated from the external pin tri_reset#. the software reset is initiated by writing a ?reset vld? command in the vld_command re gister. refer table 15-8 for the de- tails on the software reset procedure. 15.13 endian-ness vld supports little-endian and big-endian modes of op- erations. refer to appendix c for the endian-ness spec- ification of the vld input and output data. 15.14 power down the vld block can be separately powered down by set- ting a bit in the block_power_down register. for a description of powerdown, see chapter 21, ?power man- agement.? the vld block should not be active when applying block powerdown. if the block enters power-down state while it is enabled, its behavior upon power-up is undefined. 15.15 references [1] iso/iec is 13818-2, international standard (1994), mpeg-2 video. [2] iso/iec is 11172-2, international standard (1992), mpeg-1 video. [3] mpeg video compression standard, by joan l. mitchell, william b. pennebak er, chad e. fogg, didier j. legall; itp publication. table 15-7. vld picture info register (r/w) name size (bits) description pict_type (picture type) 2 i, p, or b picture pict_struc (picture structure) 2 field or frame picture fpfd (frame predic- tion frame dct) 1 specifies that this picture uses only frame prediction and frame dct intra_vlc 1 use dct table zero or one conceal_mv 1 concealment vectors present in the bitstream reserved 6 reserved for future expan- sion mpeg2 mode 1 switches vld between mpeg-1 and mpeg-2 decoding. value ?1? = mpeg-2 mode reserved 2 reserved hfrs (horizontal for- ward rsize) 4 size of residual motion vector vfrs (vertical forward rsize) 4 size of residual motion vector hbrs (horizontal backward rsize) 4 size of residual motion vector vbrs (vertical back- ward rsize) 4 size of residual motion vector table 15-8. software reset procedure cycle no. action remarks i dspcpu issues the ?reset the vld? command by writ- ing the required value in the vld_command register. i to j vld will complete the pend- ing, if any, highway transac- tions. any highway transac- tions, once started, will not be aborted in the middle j+1 vld will perform the full reset. all status and control registers are reset and all the buffers are made empty. mmio registers initial- ized to zero includes vld_status.
preliminary specification 16-1 i 2 c interface chapter 16 by essam abu-ghoush, robert nichols 16.1 i 2 c overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 includes an i 2 c interface which can be used to control many different multimedia devices such as: ? dmsds - digital multi-standard decoders ? dencs - digital encoders ? digital cameras ?i 2 c - parallel i/o expanders the key features of the i 2 c interface are: ? supports i 2 c single master mode ?i 2 c data rate up to 400 kbits/sec ? support for the 7-bit addressing option of the i 2 c specification ? provisions for full software use of i 2 c interface pins for implementing software i 2 c or similar protocols note that the i 2 c pins are also used to load the initial boot parameters and/or code from a serial eeprom as de- scribed in section 13, ?system boot? . the boot logic is only active upon pnx1300 hardware reset and quiescent afterwards. a typical system using the i 2 c interface is presented in figure 16-1 . the pnx1300 is connected as a master to a series of slave devices through scl and sda. note that the bus has one pullup resistor for each of the clock and data lines. the pullup should be set to a voltage no higher than vref_periph. 16.2 compared to tm-1000 the following are the main i 2 c differences from tm- 1000: ? the sex bit is removed. endian-ness is fixed. ?the i 2 c clock rate is closer to 100/400 khz ? the gdi bit now correctly indicates write-completion ? clock stretching is always enabled. 16.3 external interface the i 2 c external interface is composed of two signals as shown in table 16-1 . 16.4 i 2 c register set the i 2 c user interface consists of four registers visible to the programmer. the registers are mapped into the mmio address space and are fully accessible to the pro- grammer. figure 16-2 shows the i 2 c register set. to en- sure compatibility with futu re devices, any undefined mmio bits should be ignored when read, and written as ?0?s. 16.4.1 iic_ar register the iic_ar is the i 2 c address register and is used in both master receive and transmit m odes. this register is writ- ten with the address(es) of the i 2 c slave device and the bytecount for transmit/receive. table 16-2 lists the bit- field definitions for the iic_ar register. figure 16-1. typical i 2 c system implementation scl sda pnx1300 slave i 2 c slave i 2 c + vref_periph r p r p table 16-1. i 2 c external interface signal type description iic_sda i/o i 2 c serial data iic_scl o i 2 c clock table 16-2. iic_ar register bits field name definition 31:25 address 7-bit slave device address. 24 direction read/write control bit 23:16 reserved must be written to ?0? 15:8 count byte count of requested transfer 7:0 reserved read as ?0?
pnx1300/01/02/11 data book philips semiconductors 16-2 preliminary specification address must be programmed to contain the 7 bits of the desired slave address the direction bitfield contro ls read/write operation on the i 2 c interface. the bit definition is: ? direction = 0 ?> i 2 c write ? direction = 1 ?> i 2 c read the count field must contai n the desired bytecount for the current transfer. the count field will decrement by one for each data byte transferred across i 2 c . the re- maining bytecount for the current transfer can be read from the count field at any time. note that the dspcpu must refrain from rewriting the iic_ar register until the current transfer completes to avoid corrupting the bytecount or address fields. note: for writes, the byte count decrements before the byte is actually transferred over the i 2 c bus. however, the last byte is saved in an internal register and the dspcpu can write a new word when count = 0. 16.4.2 iic_dr register the iic_dr register contains the actual data transferred during i 2 c operation. for a master transmit operation, data transfer will be initiated w hen data is written to this register. transmissi on will begin with the transfer of the address byte in the iic_ar register followed by the data bytes that were written to t he iic_dr register, byte3 first and byte0 last. the i 2 c interface will in terrupt for more transmit data to be written to the iic_dr until the transfer bytecount count in the iic_ar register is reached. in master receive operation, one or more data bytes re- ceived are placed in the iic_dr register by the i 2 c inter- face. data bytes received are loaded into the iic_dr register starting with byte3, then byte2, byte1 and byte0.: the number of bytes the dspcpu requests for a transfer is written into the count bitfield of the iic_ar register. the transfer completes when the i 2 c interface receives the number of bytes indicated by the count bitfield of the iic_ar register. figure 16-2. i 2 c registers mmio_base offset: iic_ar (r/w) 0x10 3400 0 3 7 11 15 19 23 27 31 count iic_dr (r/w) 0x10 3404 0 3 7 11 15 19 23 27 31 iic_sr (r/o) 0x10 3408 0 3 7 11 15 19 23 27 31 reserved direction address byte3 byte2 byte1 byte0 reserved direction state sdnacki sanacki fi gdi gd_ien f_ien sdnack_ien sanack_ien iic_cr (r/w) 0x10 340c 0 3 7 11 15 19 23 27 31 clrfi clrgdi clrsanacki clrsdnacki enable rbc sda_stat scl_stat sw_mode_en sda_out scl_out
philips semiconductors i2c interface preliminary specification 16-3 16.4.3 iic_sr register the i 2 c status register contains status information re- garding the transfer in progress and the nature of inter- rupts associated with i 2 c operation. the iic_sr register is read only and is intended as the primary source of status regarding current i 2 c operation. the iic_sr register must be used in conjunction with the iic_cr register. the interrupt sources of the iic_sr reg- ister are individually enabled by writing to the appropriate enable bit in the iic_cr register. the bitfield definitions of the iic_sr register are presented in table 16-3 . the iic_sr provides four sources of interrupts. note: the in- terrupt should be set up as level triggered interrupt. ? gdi interrupt ? the gdi bit together with the fi bits provide status about i 2 c transfer completion. the interpretation of gdi/fi bi t combinations are different depending on whether the i 2 c interface is in master transmit or master receive mode. refer to table 16-4 and table 16-6 for gdi/fi interpretation. ? fi interrupt ? see gdi bit definition and gdi/fi transmit and receiv e definitions in table 16-4 and table 16-6 . ? sanacki interrupt ? this interrupt flag bit indicates that a slave address was transmitted but no slave on the i 2 c bus acknowledges the address to claim the transaction. this is an error condition. once the i 2 c interface has set this interrupt flag, the interface is idle. the dspcpu should cl ear this interrupt flag by writing a ?1? to iic_cr.clrsanacki before re- attempting this transfer or starting another i 2 c trans- fer. ? sdnacki interrupt ? this interrupt flag bit indicates that an addressed slave rece iver device has refused to acknowledge the current byte of data for an ongo- ing transfer. this is an error condition. once the i 2 c interface has set this interrupt flag, the interface is idle. the dspcpu should cl ear this interrupt flag by writing a ?1? to iic_cr.cl rsdnacki before retrying this transfer or starting another. the sda_stat and scl_stat bits indicate the current state of the sda and scl sig nals. the state field indi- table 16-3. iic_sr register bits field name definition 31 gdi good data interrupt. this is the nor- mal transfer complete interrupt flag. this interrupt may be asserted without the iic_sr.fi interrupt bit at the end of an i 2 c transfer or afte r master abort of an i 2 c transfer. 30 fi full interrupt. this interrupt indicates the condition of t he iic_dr register dependent upon whether the i 2 c inter- face is in receive or transmit mode. 29 sanacki slave address no acknowledge inter- rupt. 28 sdnacki slave data no acknowledge interrupt. 27 sda_stat this bit is used to examine the state of the external i 2 c sda data pin. bit polarity is: 1 = sda pad is low 0 = sda pad floated high 26 scl_stat this bit is used to examine the state of the external i 2 c scl clock pin. bit polarity is: 1 = scl pad is low 0 = scl pad floated high 25:23 state the state field indicates the micro- activity of the i 2 c bus. 22 direction direction of current data transfer. 21 reserved read as ?0? 15:8 rbc remaining byte count. 7:0 reserved read as ?0? table 16-4. master transmit mode gdi/fi status gdi fi description 0 0 message is not complete. the iic_dr is not empty. no interrupt. 0 1 message is not complete. the iic_dr is empty and the requested transmit byte count is not equal to 0. the dspcpu must write additional bytes of the current transfer to the iic_dr regis- ter. 1 x message transmission has completed. the iic_dr is empty. the byte transmit count = 0. table 16-5. state field values state meaning 000 i 2 c interface is idle. 001 reserved for future use 010 idle (msg is done, awaiting clear gdi to go to 000 state) 011 address phase is being processed 100 byte3 (first byte) is being processed 101 byte2 is being processed 110 byte1 is being processed 111 byte0 (last) is being processed table 16-6. master receive gdi/fi conditions gdi fi description 0 0 message is not complete. iic_dr is not full. no interrupt. 0 1 iic_dr contains received data and needs to be read serviced. more data bytes are expected since the receive byte count is not equal to 0. 1 x the transfer has been completed and the receive byte count is equal to 0. 0 to 4 valid bytes are in the iic_dr register awaiting read servicing by the dspcpu.
pnx1300/01/02/11 data book philips semiconductors 16-4 preliminary specification cates the microactivity of the i 2 c interface. the field val- ues and their meanings are presented in table 16-5 the direction status bi t indicates if the i 2 c interface is in transmit or receive mode. ? if direction = 0 then i 2 c is a transmitter. ? if direction = 1 then i 2 c is a receiver. the rbc bitfield indicates the remaining bytecount for an i 2 c transfer in progress. the iic_sr.rbc bitfield serves as a read-only ?shadow register? for the iic_ar.count bitfield. during i 2 c transfer, the rbc bitfield will reflect the remaining bytecount. to avoid corrupting an i 2 c transfer, the dspcpu must refrain from writing to the iic_ar.count bitfield unt il a message is complete. completion is indicated by the rbc bitfield decrementing to zero. 16.4.4 iic_cr register the i 2 c control register contains control information re- quired for enabling i 2 c transfers. this register is used to enable and clear interrupt sources which normally occur during i 2 c operation. the four interrupt sources de- scribed in the section on the iic_sr register are enabled and cleared through the iic_cr register. the enable bit- fields are: ? gd_ien ? enable for normal transfer complete interrupt. ? f_ien ? enable for iic_dr data service request interrupt. ? sanack_ien ? enable for slave address not acknowledged interrupt. this is an error interrupt. ? sdnack_ien ? enable for slave data not acknowl- edged interrupt. an addressed slave receiver has refused to accept the last byte transmitted to it. this is handled as an error interrupt. in addition to the interrupt enable bits, the iic_cr con- tains interrupt clear bits associated with each of the inter- rupt sources in the iic_sr register. these iic_cr inter- rupt clear bits are defined as: ? clrgdi ? clear bit for the gdi interrupt in the iic_sr register. writing a ?1? to this bit clears the gdi interrupt. ? clrfi ? clear bit for the fi interrupt in the iic_sr register. writing a ?1? to this bit clears the fi interrupt. ? clrsanacki ? clear bit for the sanacki inter- rupt in the iic_sr register. writing a ?1? to this bit clears the sanacki interrupt. ? clrsdnacki ? clear bit for the sdnacki inter- rupt in the iic_sr register. writing a ?1? to this bit clears the sdnacki interrupt. the remaining bitfield of the iic_cr register is: ? enable ? master enable for i 2 c serial interface. enable must be set equal to ?1? to transfer any bits from the i 2 c interface block. writing a ?0? to the enable bit effectively resets the entire i 2 c interface, including all status and interrupt flag bits. a transfer in progress is aborted and the byte currently trans- ferred is lost. note: for writes, reserved1, 2, 3 and 4 bitfields must always be written with ?0?s. table 16-7. iic_cr register bits field name definition 31 gd_ien enable for normal transfer complete interrupt 30 f_ien enable for iic_dr data service request interrupt 29 sanack_ien enable for slave address not acknowledged interrupt 28 sdnack_ien enable for slave data not acknowl- edged interrupt. an addressed slave receiver has refused to accept the last byte transmitted to it 27:26 reserved1 always write ?0?s to these bits. (see note1) 25 clrgdi clear bit for the gdi interrupt in the iic_sr register. writing a ?1? to this bit clears the gdi interrupt 24 clrfi clear bit for t he fi interrupt in the iic_sr register. writing a ?1? to this bit clears the fi interrupt 23 clrsanacki clear bit for the sanacki interrupt in the iic_sr register. writing a ?1? to this bit clears the sanacki interrupt. 22 clrsdnacki clear bit for the sdnacki interrupt in the iic_sr register. writing a ?1? to this bit clears the sdnacki interrupt. 21:6 reserved2 always write ?0?s to these bits. (see note1) 10 sw_mode_en 0 (power-on/reset default) - normal i2c hardware operating mode. 1 - enable software operating mode. the i 2 c pins are entirely controlled by user writes to the ?sda_out? and ?scl_out? register bits. 7 sda_out enabled by sw_mode_en. this bit is used by sw to manually control the external i 2 c sda data pin. bit polar- ity is: 1 = sda pad pulled low 0 = sda pad left open drain 6 scl_out enabled by sw_mode_en. this bit is used by sw to manually control the external i 2 c scl clock pin. bit polar- ity is: 1 = scl pad pulled low 0 = scl pad left open drain 5:2 reserved3 always write ?0?s to these bits. (see note1) 1 reserved4 always write ?0?s to these bits. (see note1) 0 enable i 2 c serial interface enable table 16-7. iic_cr register (continued) bits field name definition
philips semiconductors i2c interface preliminary specification 16-5 16.5 i 2 c software operation mode i 2 c software operation mode is intended for use by soft- ware i 2 c or similar algorithm implementations. in this case, the scl and sda pins are fully controlled and ob- served by software, and the hardware i 2 c interface is disconnected from the scl and sda pins. refer to figure 16-3 for a clarification of the principles involved. software mode is by default disabled after boot. soft- ware mode is enabled by writing a ?1? to iic_cr.sw_mode_en. at that point, the scl and sda pins can be controlled by the iic_cr sda_out and scl_out bits. writing a ?1? to either bit causes the cor- responding pin to become active, i.e. be pulled low. the sda and scl lines are open-collector outputs, and can hence also be pulled low by external devices. the actual pin state can be observed by software by examining iic_sr sda_stat and scl_ stat bits. a 1 in these mmio bits indicates that the corresponding pin is cur- rently pulled low. by appropriate software, possibly using a timer interrupt, full i 2 c functionality can be implemented using this mechanism. 16.6 i 2 c hardware operation mode hardware operation of i 2 c is the default mode after boot. the pnx1300 i 2 c hardware interface operates in one of two modes: 1. master-transmitter (to write data to a slave) 2. master-receiver (to read data from a slave) as a master, the i 2 c logic will generate all the serial clock pulses and the start and stop bus conditions. the start and stop bus conditions are shown in figure 16-4 . a transfer is ended with a stop condition or a repeated start condition. since a repeated start condition is also the beginning of the next serial transfer, the i 2 c bus will not be released. note: the i 2 c interface on pnx1300 will operate as a master only! the number of bytes transferred between the start and stop conditions from transmitter to receiver is not limited. each 8-bit data byte is followed by one acknowl- edge bit. the transmitter re leases the sda line which will pull-up to a high level during the acknowledge bit time. the receiver acknowledges by pulling the data line low during this acknowledge period. the master must always generate the scl transitions for the acknowledge bit time. scl sda hardware data hiway open drain scl_stat scl_out i2c dq sda_stat sda_out tribuf tribuf sw_mode_en sw_mode_en buf open drain buf dq figure 16-3. i 2 c software mode only logic
pnx1300/01/02/11 data book philips semiconductors 16-6 preliminary specification two types of data transf ers are supported by the pnx1300 i 2 c interface: ? data transfer from a master transmitter to a slave receiver, also called a write operation. the master first transmits a 1-byte slave address, then the desired number of data bytes. the slave receiver returns an acknowledge bit after each byte. the mas- ter terminates the transaction by a stop after the last byte. ? data transfer from slave transmitter to master receiver, also called a read operation. the first byte (the slave address) is transmitted by the master and acknowledged by the slave. the selected slave transmits successive data bytes which are each acknowledged by the master, except the last byte desired by the master, for which the master gener- ates a ?notack? condition. this causes the slave to terminate byte transmission. the slave transmitter then must release the bus so that the master may generate a stop condition. the type of transaction is indicated by the lsbit of the ad- dress byte. data transfer from a master transmitter to a slave receiver is called a write. it is signified by a ?0? in the lsbit of the address byte. data transfer from a slave transmitter to a master rece iver is called a read. it is signified by a ?1? in the lsbit of the address byte. example steps for successf ul programming of the i 2 c in- terface on pnx1300 are outlined as follows for both reads and writes. enable the i 2 c interface prior to at- tempting any accesses to external i 2 c devices. to enable the interface: ? set bit iic_cr.enable (0x10340c) = 1 for write addressing mode: 1. on entry, clear any possible i 2 c interrupt sources by writing iic_cr bits [25:22] = ?1111?. (note that pro- grammers must mask and enable high-level interrupt sources through the vic fa cility in the dspcpu. see the appropriate pnx1300 databook chapter). 2. enable desired i 2 c interrupt sources by setting iic_cr[31:28] bits appropriately. 3. simultaneously load iic_a r[31:25] with 7-bit slave address, iic_ar.direction = 0 and iic_ar[15:8] with the appropriate bytecount for the transfer. 4. load iic_dr[31:0] with data for the write. note that writing this register triggers the transfer across the i 2 c bus.up to 4 bytes will be tr ansferred after writing, de- pendent on bytecount in iic _ar[8:15}.transfers of more than 4 bytes have to be done by breaking them down into a sequence of 4-byte transfers and a last transfer which may be less than 4 bytes. this is done by repeatedly reloading the register until the byte- count is fulfilled. transfer is done high byte first, pro- ceeding to low byte. 5. detect i 2 c resulting condition co de in iic_sr[31:28] and respond - or - detect i 2 c high level interrupt and respond. (note that this last step is dependent upon system software requirements). 6. if transfer count is not yet fulfilled, clear gdi and fi bits and proceed with step iv) until all data is written. for read addressing mode: 1. on entry, clear any possible i 2 c interrupt sources by writing iic_cr bits [25:22] = ?1111?. (note that pro- grammers must mask and enable high level interrupt sources through the vic fa cility in the dspcpu. see the appropriate databook chapter). 2. enable desired i 2 c interrupt sources by setting iic_cr[31:28] bits appropriately. 3. simultaneously load iic_a r[31:25] with 7-bit slave address, iic_ar.direction = 1 and iic_ar[15:8] with the appropriate bytecount for the transfer. note that writing this register triggers the read across the i 2 c bus. 4. detect i 2 c resulting condition in iic_sr[31:28] and respond - or - detect i 2 c interrupt and respond. (note that this last step is dependent upon system software requirements.) 5. clear gdi and fi bits and read the contents of iic_dr. up to 4 bytes will be available in iic_dr, fe- ver if the remaining bytecount was less than 4. bytes are stored high byte first, proceeding to low byte. 6. proceed with step iv) until all data is read, i.e byte- count is fulfilled. 16.6.1 slave nak if a slave device does not generate an ack where re- quired, this is considered a nak. upon receipt of a nak after transmitting a device ad dress or data byte, the mas- ter takes the fo llowing actions: ? the i 2 c state becomes idle (state = 000) ? a stop condition is issued on the bus ? no more data is sent sda scl s p start stop figure 16-4. start and stop conditions on i 2 c
philips semiconductors i2c interface preliminary specification 16-7 16.7 i 2 c clock rate generation the i 2 c hardware block diagram is shown in figure 16-5 below. in hardware operating mode, the iic__scl exter- nal clock is derived by division from the boot_clk pin on pnx1300. the boot_clk pin is normally connected to tri_clkin. the iic__scl clock divider value is de- termined at boot time and cannot be changed thereafter. the value chosen depends on the first byte read from the eeprom, as described in section 13.2.1, ?boot proce- dure common to both autonomous and host-assisted bootstrap.? the pnx1300 i 2 c block is able to ?stretch? the scl clock in response to slaves that need to slow down byte trans- fer. this mechanism of slowing scl in response to a slave is called ?clock stretching.? this clock stretching is accomplished by the slave by holding the scl line ?low? after completion of a byte transfer and acknowledge se- quence. clock stretching is always enabled. table 16-8. i 2 c speed and eeprom byte 0 boot_clk bits eeprom speed bit divider value actual i 2 c speed 00 (100 mhz) 0 (100 khz) 1008 99.2 khz 00 1 (400 khz) 256 390.6 khz 01 (75 mhz) 0 (100 khz) 752 99.7 khz 01 1 (400 khz) 192 390.6 khz 10 (50 mhz) 0 (100 khz) 512 97.6 khz 10 1 (400 khz) 128 390.6 khz 11 (33 mhz) 0 (100 khz) 336 98.2 khz 11 1 (400 khz) 96 343.8 khz figure 16-5. i 2 c block diagram boot s/m and logic reset logic i 2 c clock gen prog pad i 2 c i/f s/m serializer/deserializer pad n 01 01 pad addr register data register boot address boot data cpu-arst tri_reset# controls controls cpu-arst iic_scl pad bootclkin ate (eeprom image byte0,bit0) iic_sda controls i 2 c low level s/m controls boot addr cpu-arst boot_sclk sclk boot data iic_ar reg iic_dr reg i sclk . . 4 sync data hiway
pnx1300/01/02/11 data book philips semiconductors 16-8 preliminary specification
preliminary specification 17-1 synchronous serial interface chapter 17 17.1 synchronous serial interface overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 synchronous serial interface (ssi) unit in- terfaces to an off-chip modem analog front end (mafe) subsystem, network terminator, adc/dac or codec through a flexible bit-serial connection. the hardware performs full-duplex serializat ion/deserialization of a bit stream from any of these devi ces. any such front end de- vice connected must support transmitting, receiving of data, and initialization via a synchronous serial interface. since the communication algorithm is implemented in software by the pnx1300 dspcpu and the analog inter- face is off chip, a wide variety of modem, network and/or fax protocols may be supported. the ssi hardware includes: ? a 16-bit receive shift register (rxsr), synchronized by an external receive frame synchronization pulse (ssi_rxfsx) and clocked by an external clock (rxclk) ? a 32-bit mmio receive data register (ssi_rxdr) to provide data access from the dspcpu ? 32-entry deep,16-bit wide receive buffer (rxfifo), to buffer between the receive shift register (rxsr) and mmio receive data register (ssi_rxdr) ? a 16-bit transmit shift register (txsr), synchronized by an external or internal transmit frame synchroni- zation pulse and clocked by an external clock (either ssi_io1 or ssi_rxclk) ? a 32-bit mmio transmit da ta register (ssi_txdr) to transmit data from the dspcpu. ? 30-entry deep, 16-bit wide transmit buffer (txfifo), to buffer between the mmio transmit data register (ssi_txdr) and transmit shift register (txsr) ? transmit frame sync pulse generation logic ? control and status logic ? interrupt generation logic the ssi unit is not a hiway bu s master. all i/o is complet- ed through dspcpu mmio cycles. fifos are used to in- crease allowable interrupt response time and decrease interrupt rate. 17.2 interface the external interface consists of the 6 pins described in table 17-1 . 17.3 block diagram the main block diagram of t he ssi unit is illustrated in figure 17-1 . the i/o block is used for control of the i/o pins and for selecting the transmit clock and transmit frame synchro- nization signals. the frame synchronization block can be used for gener- ating an internal synchronization signal derived from re- ceive clock input (ssi_rxclk) or from an i/o pin (ssi_io1). the ssi transmit block buffers and transmits the bits us- ing the generated frame synchronization signal (txfsx) and the transmit clock. the tran smit clock is either the re- ceive clock or the cl ock present on ssi_io1. the ssi receive block receives and buffers the bits on the ssi_rxdata line, using the receive clock (ssi_rxclk) and the receive frame synchronization sig- nal (ssi_rxfsx). each of the blocks will be de scribed in detail in the next subsections. table 17-1. synchronous serial interface pins name type description ssi_rxclk in-5 serial interface clock signal; pro- vided by an external communica- tion device. ssi_rxfsx in-5 frame synchronization reference signal; provided by an external communication device. ssi_rxdata in-5 receive se rial data signal; provided by the receive channel of an exter- nal communication device. ssi_txdata out transmit se rial data signal output. ssi_io1 i/o-5 transmit clo ck input or general pur- pose i/o pin. ssi_io2 i/o-5 transmit frame synchronization signal input or output or general purpose i/o pin.
pnx1300/01/02/11 data book philips semiconductors 17-2 preliminary specification 17.3.1 general purpose i/o figure 17-2 illustrates the functi onality of the general purpose i/o pins. the ssi_io1 and ssi_io2 external pins may be used as general purpose i/o by proper con- figuration of the ssi_ctl register, or they may be used as transmit clock input and as transmit framing signal in- put or output. the ssi_ctl.io1 and ssi_ctl.io2 mode select fields control the direction and functionality of these two pins. a hardware reset or a software reset of the transmitter through ssi_ctl.txr command sets the ssi_ctl.io1 and ssi_ctl.o2 fields to 11b, a conflict-free initial pin state. table 17-2 shows the effect of ssi_ctl.io1 on pin ssi_io1, table 17-3 shows the effect of ssi_ctl.io2 on ssi_io2. note: if ssi_io1 is not selected as transmit clock input, the transmit clock is taken from the receive clock signal instead. if ssi_ io2 is not selected as trans- mit framing signal input or output, the transmit framing signal is taken from the receive framing signal instead. ssi_rxclk txfsx ssi_rxfsx frame synchronization block figure 17-1. the ssi interface block diagram ssi_io2 ssi_io1 i/o control block ssi transmit block txclk ssi_txdata ssi receive block ssi_rxdata io1[1:0]=00 rio1 wio1 figure 17-2. i/o block diagram internal txfsx 2:1 mux io2[1:0] = 00 wio2 io2[1:0] = 00 ssi_io2 rio2 io2[0] = 0 io2[0] = 1 ssi_io1 io1[1:0]=01 ssi_rxfsx txfsx ssi_io2 2:1 mux io2[1:0] = 11 2:1 mux io2[1:0] = 10 internal txfsx io2[1:0] = 10 io2[1:0] = 11 txclk 2:1 mux io1[1:0]=10 io1[1:0]=10 ssi_io1 ssi_rxclk
philips semiconductors synchronous serial interface preliminary specification 17-3 17.3.2 frame synchronization the internal frame synchronization logic is illustrated in figure 17-3 . an internal frame synchronization signal (txfsx) is being generated from the transmit or receive clock selected by ssi_ctl.io1. the clock is divided by the word length (16) and a frame rate divider which is controlled by the fss[3:0] bits in the ssi_ctl register. fms determines the frame mode operation, whether the frame sync pulse is word-length or bit-length. the trans- mit framing signal is selected depending on ssi_ctl.io2, as shown in table 17-4 . 17.3.3 ssi transmit the transmitter control blo ck diagram is illustrated in figure 17-4 . the transmitter clock can be selected from two sources, i.e. ssi_io1 or ssi_rxclk by program- ming io1[1:0] bits in the ssi_ctl register (see figure 17-2 ). a transfer takes place on either the rising or falling edge of the clock, which can be configured with ssi_ctl.tcp. the transmitter has a 30-entry deep, 16-bit transmit buffer that buffers the data between the 32-bit ssi_txdr register and the 16-bit transmit shift register (txsr). the txsr is a 16-bit transmit shift register. it can be con- figured to shift out msb or lsb first with ssi_ctl.tsd. a detailed description of the configuration of the transmit- ter can be found in the ssi _ctl and ssi_csr register description ( 17.10.1 and 17.10.2 ) ssi_txdr is a 32-bit mm io transmit register. 17.3.4 ssi receive the receiver control blo ck diagram is illustrated in figure 17-5 . the receiver clock, frame synchronization and data signal are always taken from the external pins. the receiver has a 32-entry deep, 16-bit receive buffer that buffers the data between the 16-bit receive shift reg- ister (rxsr) and the 32-b it ssi_rxdata register. the input pin ssi_rxdata provides serial shift in data to the rxsr. the rxsr is a 16 -bit receive shift register. rxsr can be configured to shift in from msb or lsb first using ssi_ctl.rsd. a transfer takes place on either the rising or falling edge of the receiver clock, which can be configured with the ssi_ctl.rcp. table 17-2 effect of ssi_ctl.io1 on ssi_io1 io1[0:1] function of ssi_io1 00 general purpose output with positive logic polarity, reflecting the value in ssi_ctl.wio1 01 general purpose input, with optional change detector function. the input state can be read from ssi_csr.rio1. the change detector is clocked by the highway bus. the change detector may optionally generate an interrupt, under the control of cde bit of ssi_ctl. 10 transmit clock (txclk) input 11 tri-state, input signal value ignored table 17-3 effect of ssi_ctl.io2 on ssi_io2 io2[0:1] function of ssi_io2 00 general purpose output with positive logic polarity, reflecting the value in ssi_ctl.wio2 01 general purpose input. the input state can be read in from ssi_csr.rio2. no change detector is provided for this pin. 10 internal transmit framing signal (txfsx) out- put. 11 transmit framing signal (txfsx) input. ssi_rxclk txclk ssi_io1 word length divider frame rate divider frame sync mode fss[3:0] fms figure 17-3. frame synchronizati on generation block diagram internal txfsx 2:1 mux io1[1:0]=10 io1[1:0]=10 table 17-4. effect of ssi_ctl.io2 on transmit framing signal io2[0:1] source of transmit framing signal 00 taken from rxfsx 01 taken from rxfsx 10 internally generated 11 taken from ssi_io2 pin
pnx1300/01/02/11 data book philips semiconductors 17-4 preliminary specification a detailed description of the configuration of the receiver can be found in the ssi_ctl and ssi_csr register de- scription ( 17.10.1 and 17.10.2 ) ssi_rxdr is a 32-bit mmi o receive data register. due to the possibility of sp eculative reading of the ssi_rxdr, the read itself c an not be implemented to ac- knowledge the data as a side effect. for this reason an explicit acknowledge mechanism is provided by the ssi_rxack register. the ssi_rxack is a 1-bit mmio register that is used to signal the ssi receiver state machine that a word has been successfully read from the ssi_rxdr. writing a ?1? to this register initiates updating of the inter- nal state. writing a ?0? has no effect. the register cannot be read, its effect may be observed in the war field of the ssi_csr. the status fields of the ssi_csr will update within 1 highway clock cycle after writing to the ssi_rxack reg- ister. ssi_txdata transmit shift reg 64-byte transmit buffer transmit data reg txclk transmit control logic txfsx transmit control reg transmit status reg figure 17-4. the sync serial interface transmit block diagram txsr ssi_txdr ssi_rxclk ssi_rxfsx ssi_rxdata receive shift reg 64-byte receive buffer receive data reg receive control logic receive control reg receive status reg figure 17-5. the ssi receive block diagram rxsr ssi_rxdr
philips semiconductors synchronous serial interface preliminary specification 17-5 17.4 ssi transmit operation 17.4.1 setup ssi_ctl write the ssi_ctl to reset and enable the transmitter. both the transmitter and receiver must be reset simulta- neously. this will set all regist ers and internal logic to be same as after a power-up reset. the recommended pro- cedure is to set up all transmitter-related control bits be- fore performing a txe assert. in particular, fields tcp, rsd, io1, io2, fms, f sp, mod and tms should not be changed after enabling the transmitter until after the next transmitter reset. the txclk is taken from the ssi_io1 pin or from the re- ceive clock, dependent on ssi _ctl.io1. the direction of shift in the txsr and the clock edge on which to shift must also be configured in ssi_ctl. if the dspcpu does not poll the ssi status registers, it should enable the transmitter interrupt and set the ils field by writing to the ssi_ctl to allow interrup t driven servicing of the ssi. note that both transmit and receive use the same ils field. set the framing controls, slot size, and mode re- quired according to the exte rnal communication circuit?s requirements by writing the ssi_ctl. finally, set the in- terrupt level to respond to empty levels in the txfifo. note that the rx and tx machines share the framing and clock divide controls. they cann ot be set to different val- ues for rx and tx. if the rxclk used to derive the txclk needs a divide by two, this is done by setting ssi_csr.cd2. 17.4.2 operation details the transmit state machine will wait for transmit data to be written to the ssi_txdr register. (see also figure 17-6 ) as soon as ssi_txdr is written, it?s value will be propagated th rough two entries of the txfifo (txfifo is 16-bit and ssi_txdr is 32-bit) and trans- ferred to txsr, synchronized to txfsx. the order of transferring the two 16-bit parts in the 32-bit ssi_txdr can be configured by the endian bit ssi_ctl.ems. data will begin shifting out of tx sr, one bit for each active edge of the txclk, from eit her bit 15 (msb first ssi_ctl setting) or from bit 0 (lsb first) until txsr is empty. for endian control and shift direction see also subsection 17.8 . when the shift register is empty, the tr ansmit state machine will load the value fr om the next available txfifo location and begin shifting out that data. the transmission continues until the transmit state machine is disabled or reset. if the last available txfifo has not been updated at the appropriate time to reload txsr, the last transmitted frame is retransmitted and a transmit underrun error is in- dicated in the transmitter status ssi_csr.tue 17.4.3 interrupt and status the refill status of the ssi _txdr register is stored in ssi_csr. as the transmit state machine loads a txfifo register to the txsr, it sets the associated status bits. the ssi will generate an internal inte rrupt when the num- ber of empty words in the txfifo rises above the level set by ssi_csr.ils. if the transmit state machine at- tempts to read a txfifo while the last available txfifo has not been u pdated, it will set the transmit underrun bit. this can cause a protocol error in the transmission. the number of available word buffers (ssi_csr.waw) and transmitter data register empty (ssi_csr.tde) in- formation is updated automatically by the ssi block. ... ... ... ... 7 6 5 4 3 2 1 0 txsr 32-bit mmio reg 30-depth of 16-bit buffer 16-bit ssi_txdata 29 28 27 ... rd_ptr from hiway wr_ptr ssi_txdr figure 17-6. the transmit buffer operation
pnx1300/01/02/11 data book philips semiconductors 17-6 preliminary specification 17.5 ssi receive operation 17.5.1 setup ssi_ctl write the ssi_ctl to reset and enable the receiver. both the transmitter and receiver must be reset simultaneous- ly. this will set all registers an d internal logic the same as after a power-up reset. the recommended procedure is to set up all receiver related control bits before perform- ing a rxe assert. in particular, fields tcp, rsd, io1, io2, fms, fsp, mod and tms should not be changed after enabling the receiver until after the next receiver re- set. the direction of shift in the rxsr, mode, and the clock edge polarity must also be configured in ssi_ctl. set the framing controls according to the external communi- cation circuit?s requirements. note that the rx and tx machines share the framing and clock divide controls. if the dspcpu does not poll the ssi status registers, it should enable the re ceiver interrupt and set the ils field by writing to the ssi_ctl to allow interrupt driven servic- ing of the ssi receiver. note that both transmit and re- ceive use the same ils field. if the rxclk is double the fr equency of the data rate on the ssi bus, ssi_csr.cd2 can be used to divide the re- ceive clock by two. 17.5.2 operation details the receive state machine will begin shifting ssi_rxdata into the rxsr on the first active edge of ssi_rxclk received after the receiver is enabled (see also figure 17-7 ). when full, the rx sr is parallel trans- ferred to the first available rxfifo entry and possibly ssi_rxdr. reception continues and when rxsr is full again, a parallel load of th e next available rxfifo entry from rxsr is accomplished. this continues until the re- ceiver is disabled or reset. if the receive state machine must transfer rxsr into one of the rxfifo entries and none of the rxfifo entries is available, the value will be lost and the receive ov errun bit will be set. 17.5.3 interrupt and status the status of the rxfifo is visible in ssi_csr. war is the number of 32-bit words available for read; it is more than ils (rdf). as the receive state machine loads rxfifo from the rxsr, it sets the associated status bit. the ssi will generate an internal inte rrupt when the num- ber of full entries in rxfifo is more then ssi_ctl.ils. if the receive state machine attempts to load rxfifo while none of the rxfifo entri es is available, it will set the receive overrun bit and generate an interrupt. due to the possibility of speculative reading of the ssi_rxdr, the dspcpu must explicitly indicate a suc- cessful read of ssi_rxdr by writing a ?1? in the lsb to the ssi_rxack register. the status fields of the ssi_csr will update within 1 highway clock cycle after completion of writing to ssi_rxack register. 17.6 frame timing the frame timing can be controlled by the fss and vss fields in the ssi_ctl register. the fss[3:0] bits control t he divide ratio for the program- mable frame rate divider used to generate the frame sync pulses. the valid value ranges from 1 to 16 slots of 16 bit each, e.g. a value of 5 indicates that a frame con- tains 5 slots of 16 bits each. note: the value ?16? is ac- complished by storing a ?0? in th is field. if a codec is con- nected which generates 6 slots and the ssi block is programmed to 5 slots a framing error is indicated in ssi_csr.fes; and if tie or rie is enabled, an interrupt is generated. for an example of a frame timing diagram see figure 17-11 and figure 17-12 . the vss[3:0] bits control the nu mber of valid slots in the frame, starting from slot 1. for example, if the vsb[3:0] bits are if set to 4 and fss set to 5, slots 1, 2, 3 and 4 in the frame contain valid data from the transmitter fifo and slot 5 will contain non-va lid data. the receiver will only accept data in slot 1, 2, 3 and 4. 4 5 6 7 ... ... ... ... ... 29 30 31 rxsr 32-bit mmio reg 32-depth of 16-bit buffer 16-bit ssi_rxdata 0 1 2 3 rd_ptr wr_ptr to hiway ssi_rxdr figure 17-7. the receive buffer operation
philips semiconductors synchronous serial interface preliminary specification 17-7 17.7 interrupt generation depending on the settings of the tie, rie and cde bits in the ssi_ctl register, the ssi unit can generate inter- rupts. this is best illustrated by figure 17-8 . note: rxfes and txfes are the internal receive and transmit framing error conditions. when an ssi interrupt is detect- ed, the interrupt service routine should check all status bits.the interrupts should be set up as level-triggered in- terrupts. 17.8 16-bit endian-ness and shift direction the ssi unit supports both access orders for the 16-bit halves of a machine word. in addition, the shift direction can be controlled to select msb or lsb shifting first. the ssi_ctl.ems bit controls the 16-bit endian mode, and the tsd and rsd bits control transmit and receive shift direction. when ems is set, the first da ta word received in a frame will be transferred to bit 15-0 of the ssi_rxdr, the sec- ond word will be transferr ed to bits 31-16 of the ssi_rxdr. ems = ?0? reverses the order of the halves of ssi_rxdr. likewise in the tran smitter, when ems is set, the first data word transmitted in a frame will be bits 15- 0 of ssi_txdr, the second word transferred will be bits 31-16 of ssi_txdr. tsd and rsd control the shift direction of transmit and receive shift registers (txsr and rxsr). transmit data is transmitted msb first when tsd is ?0? or lsb first oth- erwise. receive data is re ceived msb first when rsd equals ?0?, lsb first otherwise. for an example of the transmit operation see figure 17-9 . receive works the same, only that data is shifted in . figure 17-8. interrupt generation logic. tue and or tde txfes tie roe and or rdf rie or ssi interrupt cde & cds rxfes figure 17-9. 16-bit endian and shift direction operation. ssi_txdr 31 0 15 ssi_rxfsx ssi_txdata d16 d15 d14 d13 ....... d2 d1 d0 d31 d30 d29 ....... d18 d17 d16 d15 d14 d13 ...... 1 st word 3 th word ssi_rxfsx ssi_txdata d31 d0 d1 d2 ....... d13 d14 d15 d16 d17 d18 ....... d29 d30 d31 d0 d1 d2 ...... 1 st word 3 th word ssi_rxfsx ssi_txdata d0 d31 d30 d29 ....... d18 d17 d16 d15 d14 d13 ....... d2 d1 d0 d31 d30 d29 ...... 1 st word 3 th word ssi_rxfsx ssi_txdata d15 d16 d17 d18 ....... d29 d30 d31 d0 d1 d2 ....... d13 d14 d15 d16 d17 d18 ...... 1 st word 3 th word 2 nd word 2 nd word 2 nd word 2 nd word ems = 1, tsd = 0 ems = 1, tsd = 1 ems = 0, tsd = 0 ems = 0, tsd = 1
pnx1300/01/02/11 data book philips semiconductors 17-8 preliminary specification 17.9 ssi test modes the ssi unit has two test modes which can be controlled by setting ssi_csr.tms. a remote and a local loop back testmode are supported (see also table 17-9 ). 17.9.1 remote loopback this test mode allows a remote transmitter to test itself, the intervening transmission media, and its associated receiver. in this mode, the data received on the ssi_rxdata pin is buffered and transmitted on the ssi_txdata pin. the data is not transferred to ssi_txdr/txfifo and the d spcpu is never interrupt- ed. the transmitter is clocked by the ssi_rxclk pin with a combinatorial clock delay. 17.9.2 local loopback this test mode allows the dspcpu to run local checks of the ssi. data written to th e txfifo is serialized and passed to the receiver via an internal serial connection. the receiver deserializes the data and passes it to the rxfifo register. in terrupts will be gene rated if enabled. during local loop back mode, the data on the ssi_rxdata pin is ignored and the ssi_txdata pin is tristated. an external clk must be provided during local loop back mode or no transm ission or reception will oc- cur. 17.10 mmio registers the mmio control and status registers are shown in figure 17-10 . the register fields are described in table 17-5 , table 17-6 , table 17-7 , table 17-8 , and table 17-9 . to ensure compatibility with future devices, any undefined mmio bits should be ignored when read, and written as ?0?s. ssi_ctl (r/w) 0x10 2c00 31 0 mmio_base offset: ssi_txdr (w/o) 0x10 2c10 ssi_rxdr (r/o) 0x10 2c20 ssi_rxack (w/o) 0x10 2c24 3 7 11 15 19 23 27 txdata rxdata ssi_csr (r/w) 0x10 2c04 waw fms fsp mod ems tde rdf tue rio1 rio2 0 3 7 11 15 19 31 0 3 7 11 15 19 23 27 fes cds roe txr rxr txe tsd rsd tcp rcp rxe io1 io2 wio1 wio2 tie rie fss vss ils war 31 23 27 ctue sroe cfes ccds tms cde cd2 slp reset: 0x00f00000 reset: 0x0000f000 rx_ack figure 17-10. ssi mmio registers.
philips semiconductors synchronous serial interface preliminary specification 17-9 17.10.1 ssi control register (ssi_ctl) ssi_ctl is a 32-bit read/write control register used to direct the operation of the ssi. the value of this register after a hardware reset is 0x00f00000. table 17-5. ssi control register (ssi_ctl) fields. field description txr transmitter software reset (bit 31) . setting txr performs the same functi ons as a hardware reset. resets all transmitter functions. a transmission in progress is interrupted and the data remaining in the txsr is lost. the txfifo pointers are reset and the data contained will not be transmitted, but the dat a in the ssi_txdr and/or txfifo are not explicitly delet ed. the transmitter status and interrupts are all cleared. this is an action bit. this bit always reads ?0?. writing a ?1? in combinat ion with writing a ?1? in the rxr field w ill initiate a reset for the ssi module. note: this bit is always set together with rxr because a se parate transmitter or receiv er reset is not implemented. rxr receiver software reset (bit 30). setting rxr performs the same functions as a hardware reset. resets all receiver functions. a reception in pr ogress is interrupted and the data collec ted in the rxsr is lost. the rxfifo pointers are reset, and the ssi will not generate an interr upt to dspcpu to retrieve data in the ssi_rxdr and/or rxfifo. the data in the ssi_rxdr and/or rxfifo is not exp licitly deleted. the receiver status and interrupts are all cleared.this is an action bit.this bit al ways reads ?0?. writing a ?1? in combinat ion with writing a ?1? in the txr field will initiate a reset for the ssi module. note: this bit is always set together with txr, because a separate transmitter or receiver reset is not implemented. txe transmitter enable (bit 29). txe enables the operation of the trans mit shift register state machine. when txe is set and a frame sync is detected, the transmi t state machine of the ssi is begins transmission of t he frame. when txe is cleared, the transmitter will be disa bled after completing transmission of data currently in the txsr. the serial out- put (ssi_txdata) is three-stated, and any data present in ssi_txdr and/or txfifo will not be transmitted (i.e., data can be written to ssi_txdr with txe cleared; tde can be cleared, but data will not be transfe rred to the txsr). status fields updated by the transmit st ate machine are not updated or reset w hen an active transmitter is disabled. rxe receive enable (bit 28). when rxe is set, the receive stat e machine of the ssi is enabled. when this bit is cleared, the receiver will be disabled by inhi biting data transfer into ssi_rxdr and/or rxfifo. if data is being received while this bit is cleared, the remainder of that 16-bit word will be shifted in and transferred to the ssi rxfifo and/or ssi_rxdr. status fields updated by the receive state machine are not updated or reset when an active receiver is disabled. tcp transmit clock polarity (bit 27). the tcp bit value should only be changed when the trans mitter is disabled. tcp controls on which edge of txclk data is output. tcp=0 caus es data to be output at rising edge of txclk, tcp=1 causes data to be output at falling edge of txclk. rcp receive clock polarity (bit 26). rcp controls which edge of rxclk samples data. the data is sampled at rising edge when rcp = ?1? or falling edge when rcp = ?0?. tsd transmit shift direction (bit 25). tsd controls the shift direction of transmi t shift register (txsr). transmit data is transmitted msb first when tsd = ?0? or lsb first otherwise. the oper ation of this bit is explained in more detail in section 17.8 . rsd receive shift direction (bit 24). the rsd bit value s hould only be changed when the receiver is disabled. rsd con- trols the shift direction of receive shif t register (rxsr). receive data is received msb first when rsd = ?0?, lsb first otherwise. the operation of this bit is explained in more detail in section 17.8 . io1 mode select ssi_io1 pin (bit 23-22). the io1 field value should only be changed when the transmitter and receiver are disabled. the io1[1:0] bits are used to select the function of ssi_io1 pin. the function may be selected as listed in table table 17-6 . io2 mode select ssi_io2 pin (bit 21-20). the io2 field value should only be changed when the transmitter and receiver are disabled. the io2[1:0] bi ts are used to select the function of ssi_io2 pin. the function may be selected according to table 17-7 wio1 write io1 (bit 19). value written here appears on the ssi_io1 pin when the pi n is configured to be a general purpose output. wio2 write io2 (bit 18). value written here appears on the ssi_ io2 pin when this pin is c onfigured to be a general purpose output. tie transmit interrupt enable (bit 17). enables interrupt by the tde flag in t he ssi status register (transmit needs refill) also enables interrupt of the tue (transmitter underrun error) and txfes (transmit framing error) rie receive interrupt enable (bit 16). when rie is set, the dspcpu will be inte rrupted when rdf in the ssi status reg- ister is set (receive complete). it will also be interrupted on roe (recei ver overrun error) and on rxfes (receive framing error). fss frame size select (bits 15-12). the fss[3:0] bits cont rol the divide ratio for the pr ogrammable frame rate divider used to generate the frame sync pulses. t he valid setup value ranges from 1 to 16 slot(s). the value ?16? is accom- plished by storing a 0 in this field.
pnx1300/01/02/11 data book philips semiconductors 17-10 preliminary specification vss valid slot size (bit 11-8). the vss[3:0] bits control the valid slot size (start ing from slot 1) fo r different modem analog front end devices. the valid setup value ranges from 1 to 16 slot(s). the value 16 is accomplished by storing a ?0? in this field. fms frame sync mode select (bit 7). the fms bit val ue should only be changed when the tr ansmitter and receiver are disabled. fms selects the type of fr ame sync to be recognized by both rx and tx. when fms = ?1?, frame sync is word-length bit clock. w hen this bit = ?0?, frame sync is a 1-bit clock. fsp frame sync polarity (bit 6). the fsp bit value s hould only be changed when the transmi tter and receiver are dis- abled. fsp controls which edge of frame sync is the active edge for both rx and tx. this bit causes frame signal to be active at rising edge when fsp = ?0? , or falling edge when fsp = ?1?. mod mode select (bit 5). the mod bit va lue should only be changed when the transm itter and receiver are disabled. mod selects the operational mode of the ssi for isdn functionality. when mod is set, the ssi is configured as a u-inter- face for isdn nt. otherwise, set to ?0?. setti ng mod bit and cd2 supports the mc145574 and mc145572 isdn in- terface transceivers. ems endian mode select (bit 4). selects the bi g- or little-endian mode operation. see section 17.8 for more detail. ils interrupt level select (b it 3-0). sets the point where an interrupt is generated for normal data buffer servicing. the number ranges from 1 to 15. this field controls in terrupt level of both transmit and receive functions. table 17-5. ssi control register (ssi_ctl) fields. field description table 17-6. io1 mode select bit mode 00 general purpose output: configures the ssi_io1 pin for general purpose output. the pin follows the state of the wio1 field of the ssi_ctl. 01 general purpose input: change detector may be used. value can be read in from the rio1 field of the ssi_csr. 10 enable external txclk: allows for use of an externally generated txcl k. the clock is provided via the txclk pin. all general purpose i/o func tions are unavailable. 11 disable: pin is not used. output buffer is trista ted and the input is ignored. (reset default) table 17-7. io2 mode select bit mode 00 general purpose output: configures the ssi_io2 pin as a general purpose output. the pin fo llows the state of the wio2 field of the ssi_ctl. 01 general purpose input: value can be read in from rio2 field of the ssi_csr. 10 frame signal txfsx (output): outputs the frame signal generated by t he internal frame signal generation logic. 11 frame signal txfsx (input): allows for use of an externally generated tx fsx. the frame sync signal is provided via txfsx pin. all general purpose i/o func tions are unavailable. (reset default)
philips semiconductors synchronous serial interface preliminary speci fication 17-11 17.10.2 ssi control/status register (ssi_csr) ssi_csr is a 32-bit read/write register that controls the ssi unit and shows the current status of the ssi module. the default value after hardware reset is 0x0000f000. table 17-8. ssi control/statu s register (ssi_csr) fields field description tms test mode select (bit 31-30). value should only be c hanged when the transmitter and receiver are disabled. see table 17-9 . cde change detector enable (bit 29). cde enables the change detector function on the ssi_i o1 pin. when cde is set, the dspcpu will be interrupted when cds in the ssi status regi ster is set. when cde is cleared, this interrupt is disabled. however, the cds bit will alwa ys indicate the change detector condition. when the change detector is enabled, the cl k samples ssi_io1. the cds bit will be set for either a ?0? ?> ?1? or a ?1? ?> ?0? change between the current value and the stored value. cd2 rxclk divider (bit 28). when cd2 = ?1?, the internal rxclk is divided by two. in the divide by 2 mode, the clock edge that samples the asserted frame sync pulse will resync t he rxclk divider to be a data capture edge. data samples will occur every other clock thereafter until the end of t he valid slots in the frame. slp sleepless (bit 27). when set, this bit al lows the ssi to ignore the global power dow n signal. if cleared, assertion of the global power down signal will caus e the ssi transmitter to fini sh transmission of the current 16-bit word, then enter a state similar to transmitter di sabled, (ssi_ctl.txe = ?0?). in the receiver, a 16-bit word current ly being transmitted to rxsr will comple te reception and be transferred to the rxfifo. the receiver will then enter a state simila r to receiver disabled, (ssi_ctl.rxe = ?0?). ctue clear transmitter underrun error (bit 21). a control bit written by the dspcpu to indicate that the transmitter underrun error flag should be cleared. this is an action bit. writing a ?1? clears ssi_csr.tue. the bit always reads ?0?. croe clear receiver overrun error (bit 20) . a control bit written by the dspcpu to i ndicate that the receiver overrun error flag should be cleared. this is an action bit. writi ng a ?1? clears ssi_csr.toe. the bit always reads ?0?. cfes clear framing error status (bit 19). a control bit written by the dspcpu to indi cate that the receiver?s framing error flag should be cleared. this is an action bit. writi ng a ?1? clears ssi_csr.fes. the bit always reads ?0?. ccds clear change detector status (bit 18). a control bit written by the dspcpu to indicate that the change detector status on io1 flag should be cleared. this is an action bit. writ ing a ?1? clears ssi_csr.cds. the bit always reads ?0?. waw word buffers available for write (bit 15-12). the waw[3:0] bits pr ovide the number of 32-bit words available for write in the transmit buffer (txfifo). the ssi can store 15 word s in the transmit fifo. when the fifo is empty, waw = ?15?. when the fifo is full, waw = ?0? and the ssi will igno re any further attempts to add words to the fifo. note: the fill routine should check that waw is nonzero, before writing data. war word buffers available for read (bit 11 -8). the war[3:0] bits pr ovide the number of 32-bit word available for read in the receive buffer (rxfifo). the ssi can store 16 words in the receive fifo. however, the maximum value indicated by the war register = ?15? (because it?s a 4-bit register fi eld). when the fifo is empty, war = ?0?. when the fifo is full, war = ?15? and the ssi will generate an overrun error if more data is received. tde transmit data register empty (bit 7) . in normal operation, this bit will be se t when the number of empty words in the txfifo is greater than the in terrupt level select value, ssi_ctl.ils. if ssi_ctl.tie is set, the ssi will generate an interrupt. when set, it indicates that the ssi_txdr/txfifo registers require dspcpu service for refilling after normal transmission. as the dspcpu refills the tx fifo during the interrupt service rout ine, this bit will be cleared by the ssi when the number of empty slots drop s below the value of ssi_ctl.ils. rdf receive data register full (bit 6). in normal operation, this bit will be set w hen the number of words in the rxfifo is greater than ssi_ctl.ils. if ssi_ctl.rie is set, the ssi will generate an interr upt. when set, this bit indicates that normal received data resides in ssi_rxdr register and rx fifo buffer for reading. dspcpu must service the rxfifo before a receiver overrun occurs. tue transmitter underrun error (bit 5) . no current data was available from the txfifo when a load of the txsr was scheduled. the transmitted message may have been corrupt ed. generates interrupt if enabled by tie. roe receive overrun error (bit 4). no rxfifo slot in whic h to store received data. these bits have been lost and the mes- sage stream is incomplete. generat es an interrupt if enabled by rie. fes frame error (bit 3). a frame sync pulse has been detected where not expected or did not occur as expected during transmit or receive. received data may be invalid. transmit data have been sent out of sync. receive frame error rxfes generates an interrupt if enabled by rie. transmit fr ame error txfes generates an interrupt if enabled by tie cds change detector status (bit 2). the input change det ector on ssi_io1 pin has detected a change in state. rio1 read io1 (bit 1). rio1 reflec ts the value on the ssi_io1 pin. rio2 read io2 (bit 2). rio2 reflec ts the value on the ssi_io2 pin.
pnx1300/01/02/11 data book philips semiconductors 17-12 preliminary specification 17.11 timing diagrams figure 17-11 and figure 17-12 illustrate the timing of the data signals and the frame timing. 17.12 power down ssi block can be separately powered down by setting a bit in the block_power_down register. for a de- scription of powerdown, see chapter 21, ?power man- agement.? the ssi block should not be active when ap- plying block powerdown. if the block enters power-down state while transmission is enabled, behavior upon power-up is undefined. table 17-9. test mode select bit mode 0x normal operation. 10 remote loopback test: direct connection of receiver serial data to transmitter serial data. transmitter is clocked with rxclk. no data loaded to the ssi_rxdr re gister or rxfifo buffer and no cpu interrupt is gener- ated. useful to allow remote device to test t he communication medium and the rx and tx front ends. 11 local loopback test: feedback is after ssi_txdr and ssi_rxdr regist er and serializer/deserializer. allows dspcpu to test the bulk of the rx and tx circuits . during local loopback test, an external clock on ssi_rxclk should be present to clock the ssi unit. figure 17-11. ssi serial timing. (fsp = 0, rsd = 0, tsd = 0, tcp = 0, rcp = 0, fms = 0) ssi_rxclk ssi_rxfsx ssi_rxdata ssi_txdata d0 d15 d14 d13 d12 d0 d15 d14 d13 d12 d11 d10 d9 d8 d11 d10 d9 d8 d7 d6 d5 d4 d7 d6 d5 d4 d3 d2 d1 d0 d3 d2 d1 d0 d15 d14 d13 d12 d15 d14 d13 d12 figure 17-12. ssi serial ti ming. (fsp = 0, rsd = 0, tsd = 0, tcp = 0, rcp = 0, fms = 0, fss = 5, vss = 4) ssi_rxclk ssi_rxfsx ssi_rxdata ssi_txdata 1st data 1st data 1st frame 2nd data 2nd data 3th data 3th data 4th data 4th data 1st data 1st data 2nd frame
preliminary specification 18-1 jtag functional specification chapter 18 by renga sundararajan, hans b ouwmeester and frank bouwman 18.1 overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the ieee 1149.1 (jtag) standard can be used for vari- ous purposes including testing connections between in- tegrated circuits on board level, controlling the testing of the internal structures of th e integrated circuits, and mon- itoring and communicating with a running system. the jtag standard defines on-chip test logic, four or five dedicated pins collectively called the test access port (tap) and a tap controller. the jtag standard defines instructions that must al- ways be implemented by a tap controller in order to guarantee correct behavior on board level. apart from mandatory instructions, the standard also allows user- defined and private instructions. in pnx1300, user de- fined and private instructions exist for debug purposes and for production test. for debug there is communica- tion between a debug monitor running on the pnx1300 dspcpu and a debugger front-end running on a host computer. this will be explained in chapter section 18.3 18.2 test access port (tap) the test access port includes three or four dedicated in- put pins and one output pin: ? tck (test clock) ? tms (test mode select) ? tdi (test data in) ? trst (test reset, optional!) ? tdo (test data out) trst is not present on pnx1300. tck provides the clock for test logic required by the stan- dard. tck is asynchronous to the system clock. stored state devices in jtag contro ller must retain their state indefinitely when tck is stopped at 0 or 1. the signal received at tms is decoded by the tap con- troller to control test functi ons. the test logic is required to sample tms at the rising edge of tck. serial test instructions and test data are received at tdi. the tdi signal is required to be sampled at the rising edge of tck. when test data is shifted from tdi to tdo, the data must appear without inversion at tdo after a number of rising and falling edges of tck determined by the length of the instruction or test data register selected. tdo is the serial output for test instructions and data from the tap controller. changes in the state of tdo must occur at the falling edge of tck. this is because devices connected to tdo are required to sample tdo at the rising edge of tck. the tdo driver must be in an inactive state (i.e., tdo line highz) except when data scanning is in progress. 18.2.1 tap controller the tap controller is a finite state machine; it synchro- nously responds to changes in tck and tms signals. the tap instructions and data are serially scanned into the tap controller?s instruction and data registers via the common input line tdi. the tms signal tells the tap controller to select either the tap instruction register or a tap data register as the destination for serial input from the common line tdi. an instruction scanned into the instruction register select s a data register to be con- nected between tdi and tdo and hence to be the des- tination for serial data input. tap controller state changes are determined by the tms signal. the states are used for scanning in/out tap in- struction and data, updating instruction and data regis- ters, and for executing instructions. the controller state diagram ( figure 18-1 ) shows sepa- rate states for ?capture?, ?shift? and ?update? of data and in- structions. the reason for sepa rate states is to leave the contents of a data register or an instruction register un- disturbed until serial scan-i n is finished and the update state is entered. by separating the shift and update states, the contents of a register (the parallel stage) is not affected during scan in/out. the tap controller must be in test logic reset state af- ter power-up. it remains in that state as long as tms is held at ?1?. it transitions to run-test/idle state when tms = ?0?. the run-test/idle state is an idle state of the con- troller in between scanning in/out an instruction/data reg- ister. the ?run-test? part of the name refers to start of built-in tests. the ?idle? part of the name refers to all other cases. note that there are two similar sub-structures in the state diagram, one for sca nning in an instruction and another for scanning in data. to scan in/out a data regis- ter, one has to scan in an instruction first. an instruction or data register must have at least two stages, a shift register stag e and a parallel input/output stage. when an n-bit data register is to be ?read?, the reg- ister is selected by an instru ction. the registers contents are ?captured? first (loaded in parallel into shift register stage), n bits are shifted in and at the same time n bits
pnx1300/01/02/11 data book philips semiconductors 18-2 preliminary specification are shifted out. finally the register is ?updated? with the new n bits shifted in. note: when a register is sca nned, its old value is shifted out of tdo. the new value shi fted in via tdi is written to the register at the update st ate. hence, scan in/out in- volve the same steps. this also means that reading a register via jtag destroys its contents unless otherwise stated. we can specify some registers as read-only via jtag so that when the controller transitions to update state for the read-only register, the update has no effect. sometimes, read-write registers are needed (for exam- ple, control registers used for handshake) which can be read non-destructively. in su ch cases, the value shifted in determines whether the old value is ?remembered? or something else happens. 18.2.2 pnx1300 jtag instruction set pnx1300 uses a 5-bit instru ction register. the unspeci- fied opcodes are private and their effects are undefined. table 18-1 lists the jtag instructions. 0 0 1 0 1 1 0 0 1 1 select dr scan capture dr shift dr exit1 dr pause dr exit2 dr update dr 0 0 1 0 1 1 0 0 1 1 select ir scan capture ir shift ir exit1 ir pause ir exit2 ir update ir 1 1 0 1 0 1 test logic reset run-test/ idle 1 1 0 0 figure 18-1. state diagram of tap controller 0 0 table 18-1. jtag instruction encoding encoding instruction name action 00000 extest select (dummy) boundary scan register 00001 sample/preload select (dummy) boundary scan register 11111 bypass select bypass register 10000 reset reset trimedia to power on state 10001 sel_data_in select data_in register
philips semiconductors jtag functional specification preliminary specification 18-3 the jtag instructions extest, sample/preload, and bypass are standard inst ructions and are not dis- cussed here. the macro, burnin, and pass_c_s in- structions are used during hardware test mode, and are also not discussed here. all other instructions are dis- cussed in section 18.3 18.3 using jtag for pnx1300 debug figure 18-2 shows an overview of the jtag access path from a host machine to a target trimedia system and a simplified block diagram of the trimedia processor. the jtag interface module shown separately in the diagram may be a pc add-on card such as pc-1149.1/100f boundary scan controller board from corelis inc. or a similar module connected to a pc serial or parallel port. the jtag interface module is necessary only for trime- dia systems that are not plug ged into a pc. for pc-host- ed trimedia systems, the host based debugger front-end can communicate with the target resident debug monitor via the pci bus. the enhancements to the standard functionality of jtag test logic provides a handshake mechanism for transfer- ring data to and from a trimedia processor?s mmio reg- isters reserved for this purpose, for posting an interrupt, and for resetting processor st ate. the actual interpreta- tion of the contents of the mmio registers is determined by a software protocol used by the debug monitor run- ning on the trimedia processor and the debug front-end running on a host machine. the communication between a host computer and a tar- get trimedia system via jtag requires, at a high level of abstraction, the following components. ? a host computer with a serial or parallel inter- face. the host computer transfers data to and from the jtag interface module, preferably in word-parallel fashion. a jtag interface device driver is also needed to access and modify the registers of the jtag interface module. ? a jtag interface module (hardware) that asyn- chronously transfers data to and from the host computer. the interface module synchronously transfers data to and from the jtag tap on a trimedia processor, and supplies the test clock, tck, and other signals to 10010 sel_data_out select data_out register 10011 sel_ifull_in select ifull_in register 10100 sel_ofull_out select ofull_out regis- ter 10101 sel_jtag_ctrl select jtag_ctrl regis- ter 11110 m acro hardware test mode select 01010 burnin private 01110 pass_c_s private table 18-1. jtag instruction encoding encoding instruction name action host machine jtag interface jtag board connector serial or parallel connection jtag tap (tck, tms, tdi, tdo ) main memory (sdram) dsp cpu mmi i$ d$ jtag controller mmio scan chain connecting possibly other chips on board trimedia board figure 18-2. trimedia system with jtag test access data highway module (such as a pc) may be a pc plug-in board
pnx1300/01/02/11 data book philips semiconductors 18-4 preliminary specification the trimedia jtag controller. the interface module may be a pc plug-in board. this module may transfer data from and to the host computer in bit-serial or word-parallel fashion. it transfers data from and to the jtag registers on a trimedia processor in bit-serial fashion in accor- dance with the ieee 1149.1 standard. the jtag interface module connects to a 4-pin jtag connec- tor on a trimedia board which provides a path to the jtag pins on a trimedia processor. it is the respon- sibility of the interface modul e to scan data in and out of the trimedia processor into its internal buffers and make them available to the host computer. ? a jtag controller on the trimedia processor which provides a bridge between the external jtag tap and the internal system. the controller transfers data from/to the tap to/from its scannable registers asynchronous to the internal system clock. a monitor running on a trimedia pro- cessor and the debugger front-end running on a host computer exchange data via jtag by reading/writing the mmio registers reserved for this purpose, includ- ing a control register used for the handshake. 18.3.1 jtag instruction and data registers . pnx1300 has two jtag data registers and one jtag control register (see figure 18-3 ) in mmio space and a number a jtag instructions to manipulate those regis- ters. table 18-2 lists the mmio addresses of the jtag data and control registers. the addresses are offsets from mmio_base. all references to in struction and data registers below are jtag instructions and data registers and not trimedia instruction or data registers. ? two 32-bit data registers, jtag_data_in and jtag_data_out in mmio space. both registers can be connected in between tdi and tdo like the standard bypass and boundary scan registers of jtag (not shown in figure 18-3 ). the jtag_data_in register can be read or written to via the jtag port. the jtag_data_out register is read-only via the jtag port, so that scanning out jtag_data_out is non-destructive. the jtag_data_in and jtag_data_out are readable/writable from the trimedia processor via the usual load/store operations. ? an 8-bit control register jtag_ctrl in mmio space. the jtag_ctrl register is used for hand- shake between a debug monitor running on a trime- dia and a debugger front-end running on a host. jtag_ctrl.ofull = ?1? means that jtag_data_out has valid data to be scanned out. on power-on reset of the trimedia processor, jtag_ctrl.ofull = ?0?. jtag_ctrl.ofull is both readable and writable via jtag tap. writing 0 to jtag_ctrl.ofull via jtag is a ?remember? opera- tion, i.e., jtag_ctrl.ofull retains its previous state. writing a ?1? to jtag_ctrl.of ull via jtag is a ?clear? operation, i.e., jtag_ctrl.ofull becomes ?0?. jtag_ctrl.ifull = ?0? means that the jtag_data_in register is empty. jtag_ctrl.ifull = 1 means that jtag_data_in has valid data and the debug monitor has not yet copied it to its private area. on power-on reset of the trimedia processor, jtag_ctrl.ifull = 0. jtag_ctrl.ifull is readable and writable via jtag. writing a ?0? to jtag_ctrl.ifull via jtag is a remember operation, i.e., jtag_ctrl.ifull retains it previous state. writ- ing a ?1? to jtag_ctrl.if ull posts an interrupt on hardware line 18. the peripheral blocks on a trimedia processor may enter a ?power down? state to reduce power con- sumption. the jtag_ctrl.sleepless bit determines if the jtag block participates in a power down state. in the power-on reset state, jtag_ctrl.sleep- less bit is ?1? meaning the jtag block does not power down. it can be read and written to by the tri- media processor via load/store operations and by the debugger front-end running on a host by scan in/out. ? two virtual registers, jtag_ifull_in and jtag_ofull_out. the first virtual register table 18-2. mmio register assignments mmio offset jtag register 0x 10 3800 jtag_data_in 0x 10 3804 jtag_data_out 0x 10 3808 jtag_ctrl to tdo jtag_data_in jtag_data_out jtag_ctrl from tdi 0 1 ifull ofull unused bits 7 0 31 31 0 figure 18-3. additional jtag data registers and control register 2 sleepless bit 3
philips semiconductors jtag functional specification preliminary specification 18-5 jtag_ifull_in connects the registers jtag_ctrl.ifull and jtag_data_in in series. likewise, the virtual re gister jtag_ofull_out connects jtag_ctrl.of ull and jtag_data_out in series. the reason for the virtual registers is to shorten the time for scanning the jtag_data_in and jtag_data_out registers. without virtual regis- ters, we must scan in an instruction to select jtag_data_in, scan in data, scan an instruction to select jtag_ctrl register and finally scan in the control register. with virtual register, we can scan in an instruction to select jtag_ifull_in and then scan in both control and data bits. similar savings can be achieved for scan ou t using virtual registers. ? five jtag instructions ? 5 instructions, sel_data_in, sel_data_out, sel_ifull_in, se l_ofull_out, and sel_jtag_ctrl, for sele cting the registers to be connected between tdi and tdo for serial input/output. ? an instruction reset fo r resetting the trimedia processor to power on state. ? in the capture-ir state of the tap controller, the least 2 significant bits (bits 0 and 1) of the shift register stage must be loaded with the ?01? as required in the standard. the standard allows the remaining bits of the ir shift stage to be loaded with design specific data. the bits 2, 3 and 4 of the ir shift stage are loaded with bits 0, 1 and 2 of the jtag_ctrl regis- ter. this means that shifti ng in any instruction allows the 3 least significant bits of the jtag_ctrl register to be inspected. this reduces the polling overhead for data transfer. race conditions since the jtag data registers live in mmio space and are accessible by both the trimedia processor and the jtag controller at the same time, race conditions must not exist either in hardware or in software. the following communication protocol uses a handshake mechanism to avoid software race conditions. 18.3.2 jtag communication protocol the following describes the handshake mechanism for transferring data via jtag. ? transfer from debug front-end to debug monitor the debugger front-end running on a host transfers data to a debug monitor via jtag_data_in regis- ter. it must poll jtag_c trl.ifull bit to check if jtag_data_in register can be written to. if the jtag_ctrl.ifull bit is clear, the front-end may scan data into jtag_data_ifull_in register. note that data and control bits may be shifted in with sel_ifull_in instruction and the bit shifted into jtag_ctrl.ifull register mu st be ?1?. this action triggers an interrupt. the debug monitor must copy the data from jtag_data_in register into its private area when servicing the interrupt and then clear jtag_ctrl.ifull bit thus allowing jtag interface module to write to jtag_data_in register the next piece of data. ? transfer from monitor to front-end the monitor running on trimedia must check if jtag_ctrl.ofull is clear an d if so, it can write data to jtag_data_out. after that, the monitor must set the jtag_ctrl.ofull bit. the debugger front-end polls the jtag_ctrl.ofull bit. when that bit is set, it can scan out jtag_data_out register and clear jtag_ctrl.ofull bit. since jtag_data_out is read-only via jtag, the update action at the end of scan out has no effect on jtag_data_out. the jtag_ctrl.ofull bit, however, must be cleared by shifting in the value ?1?. ? controller states in the power-on reset state, jtag_ctrl.ifull and jtag_ctrl.ofull must be cleared by the jtag con- troller. 18.3.3 example data transfer via jtag scanning in a 5-bit instruct ion will take 12 tck cycles from the run-test/idle state: 4 cycles to reach shift-ir state, 5 cycles for actual shifting in, 1 cycle to exit1-ir state, 1 cycle to update-ir state, and 1 cycle back to run-test/idle state. likewise, scanning in a 32 bit data register will take 38 tck cycle s and transferring an 8-bit jtag_ctrl data register will take 14 tck cycles from idle state. however, if a da ta transfer follo ws instruction transfer, then the transition to dr scan stage can be done without going through idle state, saving 1 cycle. 18.3.3.1 transferring data to trimedia via jtag poll control register to check if input buffer is empty. scan in data when it is empty and set the ifull control bit to ?1? triggering an interrupt. note that scanning in any instruc- tion automatically scans out the 3 least significant bits (including ifull and ofull bits ) of the jtag_ctrl register. table 18-3. transfer of data in via jtag action number of tck cycles ir shift in sel_ifull_in instruction 12 while jtag_ctrl.ifull = 1, scan in sel_ifull_in instruction 11+ dr scan 33 bits of regi ster jtag_ifull_in 38 total 61+ cycles
pnx1300/01/02/11 data book philips semiconductors 18-6 preliminary specification 18.3.3.2 transferring data from trimedia via jtag poll control register to check if output buffer is full. scan out data when it is full and clear the ofull control bit. note that scanning in any instruction automatically scans out the 3 least significant bits (inc luding ifull and ofull bits) of jtag_ctrl register. note that the above timings do not include the over- heads of the jtag software driver for jtag interface module plugged into a pc. 18.3.4 jtag interface module it is expected that the inte rface module will be a program- mable jtag interface module. one end of the module should be connected to a jtag tap and the other end to a host computer via a serial or parallel line or plugged into a pc. it is up to the jt ag driver software on a host computer to program the jtag interface module via the serial/parallel interface for transferring data to/from the target. the transfer rates will depend on the interface module. table 18-4. transfer of data out via jtag action number of tck cycles ir shift in sel_ofull_out instruction 12 while jtag_ctrl.ofull = 0, scan in sel_ofull_out instruction 11+ dr scan 33 bits of regi ster jtag_ofull_out 38 total 61+ cycles
preliminary specification 19-1 on-chip semaphore assist device chapter 19 19.1 overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 has a simple mp se maphore-assist device. it is a 32-bit register, accessible through mmio by either the local pnx1300 cpu or by any other cpu on pci through the aperture made available on pci. the sema- phore, sem, is located at mmio offset 0x10 0500. sem operation is as follows: each master in the system constructs a personal nonzero 12 bit id (see below). to obtain the global semaphore, a master does the follow- ing action: write id to sem (use 32 bi t store, with id in 12 lsb) retrieve sem (use 32 bit load, it returns 0x00000nnn) if (sem = id) { ?performs a short critical section action? write 0 to sem } else ?try again later, or loop back to write? 19.2 sem device specification sem is a 32-bit mmio location. the 12 lsb consist of storage flip-flops with surrounding logic, the 20 msbs al- ways return a ?0? when read. sem is reset to ?0? by power up reset. when sem is written to, the storage flip-flops behave as follows: if (cur_content == 0) ne w_content = write_value; else if (write_value == 0) new_content = 0; /* else no action ! */ 19.3 constructing a 12-bit id a pnx1300 processor can construct a personal, nonzero 12-bit id in a variety of ways. below are some sugges- tions. pci configspace personali ty entry. each pnx1300 receives a 16-bit personality value from the ee- prom during boot. this personality register is lo- cated at offset 0x40 in conf iguration space. in a mp sys- tem, some of the bits of personality can be individualized for each cpu involved, giving it a unique 2/3/4-bit id, as needed given the maximum number of cpus in the design. in the case of a host-assisted pnx1300 boot, the pci bios assigns a unique mmio_base and dram_base to every pnx1300. in particular, the 11 msbs of each mmio_base are unique, since each mmio aperture is 2 mb in size. these bits can be used as a personality id. set bit 11 (msb) to '1' to guarantee a nonzero id#. 19.4 which sem to use each pnx1300 in the system adds a sem device to the mix. the intended use is to treat one of these sem de- vices as the master semaphore in the system. many methods can be used to determine which sem is master sem. some examples below: each dspcpu can use pci configuration space access- es to determine which other pnx1300s are present in the system. then, the pnx1300 with the lowest per- sonality number, or the lowest mmio_base is chosen as the pnx1300 containing the master semaphore. 19.5 usage notes to avoid contention on the master sem device, it should only be used for inter-processor semaphores. processes running on a single cpu can use regular memory to im- plement synchronization primitives. the critical section associ ated with sem should be kept as short as possible. preferably, sem should only be used as the basis to make multiple memory-resident sim- ple semaphores. in this case, the non-cacheable dram area of each pnx1300 can be used to implement the semaphore data stru ctures efficiently. as described here, sem do es not guarantee starvation- free access to critical re sources. claiming of sem is purely stochastic. this shoul d work fine as long as sem is not overloaded. utmost care should be taken in sem access frequency and duration of the basic critical sec- tions to keep the load conditions reasonable. 00000000000000000000 31 12 11 0 sem 0x10 0500
pnx1300/01/02/11 data book philips semiconductors 19-2 preliminary specification
preliminary specification 20-1 arbiter chapter 20 by eino jacobs, luis lucas , chris nelson, allan tzeng, gert slavenburg 20.1 arbiter features in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 internal highway bus conveys all the memory and mmio traffic. the on-chip peripheral units described in this databook are connected to this internal highway bus. accesses to the bus are controlled by a central arbiter. figure 2-1 on page 2-2 shows the whole system where the arbiter is embedded in the main mem- ory interface (mmi) block. the traffic includes the memo- ry requests issued by most of the on-chip units as well as the mmio transactions issued by the dspcpu or pci block and responded to by the peripherals. the arbiter was designed to make pnx1300 a true real- time system by providing a highly programmable bus bandwidth allocation scheme. the primary characteris- tics are: ? round robin arbitration ? hierarchical organization ? programmable allocation of highway bandwidth ? dual priorities with priority raising mechanism these features are explained in the next sections of this chapter. the arbiter is programmed through two mmio registers: ? arb_raise ?arb_bw_ctl the default values (after hardware reset ) stored in these two mmio registers are su itable for most of the ap- plications. if these default settings introduce violations of real-time constraints in units like video in (vi), video out (vo), audio in (ai) and audio out (ao) (each of these units has a highway bandwidth error detection mecha- nism), the arb_bw_ctl register should be pro- grammed to 0x090a9. this setting gives almost maxi- mum priority to real-time units but may slow down the cpu. fine tuning of the arbiter settings is described in the fol- lowing sections. 20.2 dual priorities with priority raising mechanism the best cpu performance is obtained if cache misses can take priority over periph eral requests on the high- way. however, peripherals need to have a maximum guaranteed latency low enough to satisfy the real-time constraints of i/o units. pnx1300 provides this featur e with the following priority- raising mechanism. peripheral unit requests can have 2 priorities: low and high. within each class there is fair, round-robin arbitra- tion ( section 20.3 ). requests with high priority take pre- cedence over requests with low priority. units can indicate the priority of their requests to be low or high. a unit may initially post a r equest with low priority. if the request is not serviced within a particular waiting time, the unit can raise the priority of the request to high. this can be done when the worst case latency at high priority approaches the real-time constr aint of the unit. thus, the unit uses only spare bandwidth without slowing down the cpu unless real-time constraints require it to claim high priority. in pnx1300, only the icp unit has its own priority raising logic (i.e. it controls the low to high transition of the re- quest). refer to chapter 14, ?image coprocessor,? for more information. priority raising for the vld, pci, vi and vo units is han- dled by the arbiter central priority raising mechanism. the central priority raising mechanism settings are con- trolled from the dspcpu with the arb_raise mmio register (see table 20-1 ). the delay is the amount of time for which the arbiter han dles the request at low pri- ority. the delay is defined by a 5- bit field (dedicated per unit) and is counted in cpu cloc k cycles. the granularity of the delay is 16 cycles, so t he maximum time spent at low priority for each request can be programmed from 0 to 496 cycles, inclusive, in increments of 16 cycles. the default value for the entire arb_raise register is ?0?. this causes all requests from vld, pci, vi and vo to be handled as high-priority requests until the table 20-1. arb_raise register layout offset name bits fields 0x10010c arb_raise 19:15 vld_delay[4:0] 14:10 pci_delay[4:0] 9:5 vi_delay[4:0] 4:0 vo_delay[4:0]
pnx1300/01/02/11 data book philips semiconductors 20-2 preliminary specification arb_raise register contents has been changed for the application requirements. corner-case note : there is some risk in setting the delay high, then lowering it, as t he last request submitted with the high delay might violate the latency constraints of the new real-time domain. however this should not happen since this register should be set before the application starts. the other units (ai, ao and bti (boot block)) and the cpu will always have their re quests considered as high priority. high priority for th e cpu will give maximum pos- sible performance. ao and ai requests are happening at very low rate. hence, the probability that they take time away from the cpu is negligible. 20.3 round robin arbitration in addition to the dual priority mechanism, a round-robin arbitration is used to schedule the requests with same priority. the purpose is to ensure, for every unit with a high-priority request, a maxi mum latency for gaining ac- cess to the highway and/or a minimum share of the avail- able bandwidth. round-robin arbitration ensures that no starvation of re- quests can occur and therefore requests with real-time constraints can be handled in time. the round robin arbitration algorithm is as follows. requests are granted according to a dynamic priority list. whenever a unit request is granted, it will be moved to the last position in the prio rity list and another unit will be moved to the first position in th e priority list. priorities are rotated. a unit with a waitin g request will eventually reach the first place in the priority list. as an example, figure 20-1 shows a state diagram of an arbitration state machine wit h 2 requesters. the nodes a and b indicate states a and b. in state a, requester a has ownership of the highway, in state b requester b has ownership. the arc from state a to state b indicates that if the current state is state a and a request from request- er b is asserted, then a trans ition to state b occurs, i.e. ownership of the highway passes from requester a to re- quester b. when, in a particular state, none of the arcs leaving from that node has its condition fu lfilled, the state machine re- mains in the same state. when both requester a and b have requests asserted, then ownership of the high way switches between a and b, creating fair allocation of ownership. figure 20-2 pictures a state diagram that allocates fair arbitration with 3 requesters. 20.3.1 weighted round robin arbitration not all units need to have equal latency and bandwidth. it is preferred to allocate bandwidth to units according to their needs. this is achieved with weighted round-robin and can be illustrated in the following examples. figure 20-3 pictures a state mach ine with two requesters a and b with double weight given to requester a. there are now 2 states a1 and a2 where requester a has own- ership of the highway. when both a and b requests are asserted, requester a will have ownership of the highway twice as often as requester b. a b figure 20-1. state diagram of round robin arbitra- tor with 2 requesters. b a ab figure 20-2. state diagram of round robin arbitra- tor with 3 requesters. a&~c b c a c b&~a c&~b a1 b figure 20-3. state diagram of round robin arbitra- tor with 2 requesters; a has double weight. b&~a a a2 a b
philips semiconductors arbiter preliminary specification 20-3 figure 20-4 shows a state machine with 3 requesters in which double weight is given to requester a. such state machines can become very complex and cannot be implemented for a large system like pnx1300 with 9 requesters. hierarchy or arbitration levels are used to overcome this problem. 20.3.2 arbitration levels the arbitration is split into multiple levels of hierarchy. each level of hierarchy has an independent arbitration state machine. at the bottom of the hierarchy, the arbitra- tion is performed between a group of units. whichever of these units ?wins? is passed to the next level of hierarchy, where the selected unit compet es with other units at that level for highway access.this is continued until the high- est level of arbitration. by splitting arbitration into multiple levels it is easy to support a large number of highway units while the com- plexity of the arbitration stat e machines at each level of hierarchy remains modest. a1 b figure 20-4. state diagram of round robin arbitra- tor with 3 requesters; a has double weight. b a2 c a c a b&~a c&~a a&~b a&~c b&~c&~a c&~b&~a l1 arbitration l6 arbitration l5 arbitration l4 arbitration l3 arbitration l2 arbitration cache priority-based arbitration vo_req icp_reqh icp_reql vi_req pci_req vld_req ai_req ao_req bti_mmio_req bti_req pci_mmio_req ic_req dc_req dc_mmio_req dc_req_pref 1/2/3 1/2/3 1/3/5 1/3/5/7 1/3/5/7 1/3/5 1/2 1/3/5 1/3/5 1/2 11 1 11 11 2 figure 20-5. arbitration architecture dvdd_req 1 spdo_req 1
pnx1300/01/02/11 data book philips semiconductors 20-4 preliminary specification hierarchy also makes it easy and natural to allocate bus bandwidth or latency to a group of units. most bandwidth or latency-demanding units are located at the top of the hierarchy while the less demanding are at the bottom and get a small amount of overall bandwidth. 20.4 arbiter architecture in addition to the dual priority mechanism described in section 20.2 , pnx1300 supports an arbitration architec- ture made of 6 fixed levels of hierarchy. this is combined with a programmable weighted round robin algorithm per level, as pictured in figure 20-5 . the weights can be adjusted by software to allocate bandwidth and latency depending on application require- ments. within a level of hierarchy the units can have equal weights, giving them an equal share of bandwidth. alternatively, they can have different weights, giving them an unequal share of the bandwidth for that level. the arbitration weights at each level are described in table 20-3 and illustrated in figure 20-5 . table 20-2 presents the minimum bandwidth allocation at level 1 between the d spcpu and the peripherals (level 2) according to the different weight values that can be programmed. note that pr ogramming a weight of 3/3 or 2/2 instead of 1/1 is legal and results in the same allo- cation. note : the different types of requests from the dspcpu caches are arbitrated locally before sending a single cpu request to the arbiter. the pci bus also performs lo- cal arbitration before sending a system request to the ar- biter. the weight programming is done by setting the mmio register arb_bw_ctl . register offset as well as field description and coding is provided in table 20-4 . the hardware reset value of arb_bw_ctl is 0, re- sulting in a weight of 1 for all requests. note that each media processor application needs to carefully review its arbiter settings. table 20-2. minimum bandwidth allocation between cpu caches and peripheral units. weight of cpu and caches weight of level 2 bandwidth at level 1 bandwidth at level 2 3 1 75% 25% 2 1 67% 33% 3 2 60% 40% 1 1 50% 50% 2 3 40% 60% 1 2 33% 67% 1 3 25% 75% table 20-3. arbitration weights at each level level arbitration weights level 1: cpu mmio, dcache, lcache are arbitrated with fixed priorities betw een each other and together have a programmable weight of 1, 2 or 3. level 2 has a programmable weight of 1, 2 or 3. level 2: vo unit has a programmable weight of 1, 3 or 5. level 3 has a programmable we ight of 1, 3, 5 or 7. level 3: the icp unit has a progr ammable weight of 1,3,5 or 7. level 4 has a programm able weight of 1,3 or 5. level 4 the vi unit has a progr ammable weight of 1 or 2. level 5 has a programmabl e weight of 1,3 or 5. level 5: the pci unit has a progr ammable weight of 1,3 or 5. level 6 has a programmable weight of 1 or 2. level 6: level 6 contains several lower bandwidth and/or latency-tolerant units. the vl d has a weight of 2. ai, ao, dvdd and the boot bloc k (only active during booting) have a weight of 1. table 20-4. arb_bw_ctl mmio register offset level of arbitration field bits allowed values 0x100104 n/a reserved 25:18 level 1 cpu weight 17:16 00 = weight 1 01 = weight 2 10 = weight 3 level 1 l2 weight 15:14 00 = weight 1 01 = weight 2 10 = weight 3 level 2 vo weight 13:12 00 = weight 1 01 = weight 3 10 = weight 5 level 2 l3 weight 11:10 00 = weight 1 01 = weight 3 10 = weight 5 11 = weight 7 level 3 icp weight 9:8 00 = weight 1 01 = weight 3 10 = weight 5 11 = weight 7 level 3 l4 weight 7:6 00 = weight 1 01 = weight 3 10 = weight 5 level 4 vi weight 5 0 = weight 1 1 = weight 2 level 4 l5 weight 4:3 00 = weight 1 01 = weight 3 10 = weight 5 level 5 pci weight 2:1 00 = weight 1 01 = weight 3 10 = weight 5 level 5 l6 weight 0 0 = weight 1 1 = weight 2
philips semiconductors arbiter preliminary specification 20-5 20.5 arbiter programming the pnx1300 arbiter accepts programmable bandwidth weights to directly control the percentage of bandwidth allocated to each unit. in th e worst case a ll bandwidth is used. if not all of the bandwidth is used, then all units eventually get their desired bandwidth (as the bus be- comes free) regardless of the weights . however, the weights still indirectly guar antee each uni t a worst-case latency, which is importan t for the real-time behavior. there are two basic types of pnx1300 coprocessor and peripheral units. the first type is units which have hard real-time constraints, i.e. vo, vi, ao and ai. to ensure multimedia functionality, these units must be able to ac- quire the bus within a fixed amo unt of time in order to fill or empty a buffer before it over- or underflows. the second type, the cpu, pci, icp, vld and dvdd units, can absorb long latencies but performance is en- hanced (there are fewer stall cycles or waiting cycles) if latency is short. the bandwidth requirement is usually known and depends on the application. it is especially well known that icp and vld or dvdd have a fixed bandwidth requirements in multimedia applications. for the pnx1300 dspcpu, latency is of prime impor- tance. cpu performance redu ces as average latency in- creases. the design of the arbiter guarantees that the dspcpu gets all unused bus bandwidth with lowest pos- sible latency. optimal operation is achieved if the arbiter is set in such a way that the dspcpu has the best pos- sible latency given the required latency and bandwidth of units active in the application. to pick programmable weights and priority raising de- lays, the following procedure is recommended: 1. try to keep cpu weight as high as possible through the remaining steps. 2. pick weights sufficient to guarantee latency to hard real-time peripherals (see section 20.5.1 ). 3. pick weights for remaining peripherals in order to give enough bandwidth to each (see section 20.5.2 ). step 2 above has priority, because bandwidth can be ac- quired as the bus becomes free and because the hard real-time units use a known amount of bandwidth. 4. if latency and bandwidth slack remains, increase pri- ority raise delays in order to improve average cpu la- tency. 20.5.1 latency analysis in the following, ceil( x) is the least integral value greater than or equal to x. latency is defined in each real-time unit chapter through this databook. refer to the related sections to find out the latency requirement according to the mode and clock speed at which the unit is operating. this latency value has to be larger than the maximum la- tency l x (in nanoseconds) guaranteed by the arbiter. for a unit x the arbiter guarantees a latency of: l x = l x,sc * (sdram cycle time in ns) where l x,sc = (d x * t) + e + ceil(d x * t / k d ) * k + ceil(16*r x /c) is the latency in sdram clock cycles. latency in cpu clock cycles is defined by: l x,cc = ceil(l x,sc * c) the symbols are defined as follows: t = 20 cycles (transaction length, assuming worst case pattern alternating reads and writes). e = 10 cycles (extra delay in case the first transaction made by the cpu requires a different bank order to sat- isfy the critical word first. k = 19 cycles (refresh transaction length). k d is the programmed refresh interval (see section 12.11 on page 12-6 ). c is the cpu/sdram ratio (i.e. 5/4, 4/3, 3/2, 2/1 or 1 as explained in section 12.6.2 on page 12-4 ). r x is the priority raise delay of unit x as stored in mmio register arb_raise (see section 20.2 ). r x = 0 for units other than vo, vi, pci or vld. d x is the worst case number of requests that the arbiter allows before the request from unit x goes through. d x includes the transaction from unit x (the unit which needs the data) as well as the internal implementation delays that occur in the transaction. d x is derived from the arbiter settings as follows: d cpu ceil cpu weight l 2 weight + cpu weight ------------------------------------------------------ ?? ?? = d vo ceil vo weight l 3 weight + vo weight ------------------------------------------------- - ?? ?? d 2 1 + = d icp ceil icp weight l 4 weight + icp weight --------------------------------------------------- - ?? ?? d 3 1 + = d vi ceil vi weight l 5 weight + vi weight ----------------------------------------------- - ?? ?? d 4 1 + = d pci ceil pci weight l 6 weight + pci weight --------------------------------------------------- - ?? ?? d 5 1 + = d vld ceil 211011 +++++ 2 ------------------------------------------------- ?? ?? d 6 1 + = d ai ceil 211011 +++++ 1 ------------------------------------------------- ?? ?? d 6 1 + = d ao ceil 211011 +++++ 1 ------------------------------------------------- ?? ?? d 6 1 + = d dvdd ceil 211011 +++++ 1 ------------------------------------------------- ?? ?? d 6 1 + = d spdo ceil 211011 +++++ 1 ------------------------------------------------- ?? ?? d 6 1 + =
pnx1300/01/02/11 data book philips semiconductors 20-6 preliminary specification where as an example, if cpu weight is 3, l2 weight is 2, vo weight is 3 and l3 weight is 7, then ?d 2 is ceil[(3 + 2) / 2] = 3, ?d vo is ceil[(3 + 7) / 3] * 3 +1 = 13. if cpu/sdram ratio is 5/4 (for example memory fre- quency is 80 mhz and cpu frequency is 100 mhz), re- fresh interval k d is 1220 cycles, and r x is 2, then the maximum latency for vo is: ?l vo,sc = 13 * 20 + 10 + ceil[13 * 20 / 1220] * 19 + ceil(16 * 2 / (5 / 4)] = 315 sdram cycles ?l vo = l vo,sc * 12.5 = 3937.5 ns note : average latency is norma lly much lower than worst case latency becaus e on rare occasions many units will issue requests at exactly the same time (this is assumed when evaluating the maximum latency). note : all real-time units have a special exception notifi- cation flag that is raised if an overflow or underflow oc- curs while operating. note : to compute the latency l x when a unit is not en- abled, its weight has to be set to ?0? in the d {2,3,4,5,6} equations and in d {ai,ao,vld} for ai, ao or vld. these equations are not accurate for all the weights, but give an upper bound of the worst case (which is usually too pessimistic). a much more accurate number could be found by simu- lating the arbiter, e.g. if the settings are: cpu weight =1, l2 weight =2, vo weight =1 and l3 weight =1, then d vo = ceil[(1 + 1) / 1] * ceil[(1 + 2) / 2] giving 4 requests. but actua lly the worst case grant re- quests order is: cpu, l3, vo - resulting in 3 requests only. 20.5.2 bandwidth analysis in the following, ceil(x) means the least integral value greater than or equal to x. minimum allocated bandwidth, b x for a unit x, by the ar- biter is defined as follows: b x = (m cycles - k k ) * s / [t * e x + (16 * r x / c)] where: m cycles is the total amount of sdram cycles available in a period p in which the bandwidth is computed. for ex- ample, if the period is 1 second and sdram runs at 80 mhz then m cycles is 80,000,000. k k is the amount of sdram cycles used by the refresh during the same period p. if p is in seconds it could be expressed as: k k = ceil(4096 * p / .064) * k for example, if p is 1 second then k k is ceil(4096 * 1 / .064) * 19 = 1216000 sdram cycles. s is the size of the transaction on the bus. for pnx1300, s is equal to 64 (bytes). e x is the ratio of requests available for a unit x according to the arbiter settings. it means the unit x will get 1 / e x out of the total requests. e x is derived from the arbiter settings as follows: where: d 2 ceil cpu weight l 2 weight + l 2 weight ------------------------------------------------------ ?? ?? = d 3 ceil vo weight l 3 weight + l 3 weight ------------------------------------------------- - ?? ?? d 2 = d 4 ceil icp weight l 4 weight + l 4 weight --------------------------------------------------- - ?? ?? d 3 = d 5 ceil vi weight l 5 weight + l 5 weight ----------------------------------------------- - ?? ?? d 4 = d 6 ceil pci weight l 6 weight + l 6 weight --------------------------------------------------- - ?? ?? d 5 = e cpu cpu weight l 2 weight + cpu weight ------------------------------------------------------ = e vo vo weight l 3 weight + vo weight ------------------------------------------------- - e 2 = e icp icp weight l 4 weight + icp weight --------------------------------------------------- - e 3 = e vi vi weight l 5 weight + vi weight ----------------------------------------------- - e 4 = e pci pci weight l 6 weight + pci weight --------------------------------------------------- - e 5 = e vld 211011 +++++ 2 ------------------------------------------------- e 6 = e ai 211011 +++++ 1 ------------------------------------------------- e 6 = e ao 211011 +++++ 1 ------------------------------------------------- e 6 = e dvdd 211011 +++++ 1 ------------------------------------------------- e 6 = e spdo 211011 +++++ 1 ------------------------------------------------- e 6 = e 2 cpu weight l 2 weight + l 2 weight ------------------------------------------------------ = e 3 vo weight l 3 weight + l 3 weight ------------------------------------------------- - e 2 = e 4 icp weight l 4 weight + l 4 weight --------------------------------------------------- - e 3 =
philips semiconductors arbiter preliminary specification 20-7 for example, with the same settings as in the example of section 20.5.1 , then ?e 2 is (3 + 2) / 2 = 2.5 ?e vo is (3 + 7) / 3 * 2.5 = 8.33, which gives ?b vo = (80 - 1.216) * 64 / [ 20*8.33 + 16*2 / (5/4) ] resulting in 26.23 million b/ sec corresponding to 25.01 mb/sec. note : in order to compute the latency b x when a unit is not enabled, its weight has to be considered as ?0? in the e {2,3,4,5,6} equations and in e {ai,ao,vld} for ai, ao or vld. the maximum amount of requests, a x, for unit x allowed during m cycles period is: a x = floor(b x / s) where floor(x) is the greatest integral value less than or equal to x. note : this number does not take into account the worst case pattern for request acknowledgment. thus if the pe- riod is too small a x is not accurate. 20.6 extended behavior analysis the following sections describes a more accurate behav- ior of the pnx1300 arbitration system. 20.6.1 extended bandwidth analysis the minimum bandwidth allocation derived from the ar- biter settings is accurate if one of the two following con- ditions are true: ? the units emit requests all the time (i.e. do back-to- back requests) ? after a request has been acknowledged, the unit emits a new request before the new arbitration point. the arbitration is decided around every 16 cycles. this time depends on the direction of the transac- tions (read/write). in pnx1300, the only unit almost able to sustain back-to- back requests is the data ca che. the other units will post a request and wait for the data before the next request is posted. this behavior makes the bandwidth computa- tion: ? almost accurate if the unit is down in the arbiter hier- archy (true if the units placed above are enabled). ? rather inaccurate if large weights are used for a unit. since no back-to-back requests are implemented, the worst case is that a unit can only get one request out of three if all the others are asking. this limits the use of large weights for other units than data cache. however some units might be able to catch one request out of two. this depends on the way requests interleave, since the arbitration point is dependent on the type of the request (read or write) as well as on the cpu ratio. this makes it almost impossible to describe the behavior precisely. the exact bandwidth necessary for units like vo, vi, ao or ai are well known (see dedicated sections in each cor- responding chapter). if the arbiter settings allocate more bandwidth for these units than they can use, the extra bandwidth can be used by units that are located below these units (vo, vi) or at the same level as (ao and ai) in the arbiter hierarchy. as an example, with the default settings, vo gets 25% of the available bandwidth and the cpu gets 50%. if the sdram clock speed is 100 mhz, then 100 mb/sec are allocated to vo. if vo runs at 27 mhz (ntsc or pal mode), then vo will not use all this allocated bandwidth. thus any of the units that are below vo in the arbiter hi- erarchy can potentially use the remaining allocated bandwidth. in other words - even if only 10% are allocated to one unit like the cpu, pci or the icp, it may use more. 20.6.2 extended latency analysis some units (vo and vi) have a latency/bandwidth re- quirement and their behavior needs to be simulated in or- der to find out the correct settings. for example the re- quirement for vo (in image mode 4:2:2 or 4:2:0 without up scaling, overlay disabled) is: ? during 128 vo clock cycles, vo block needs to have 2 requests acked ([2 ys, one u and one v]/2). the default value ?0? for arb_bw_ctl leads to a bus al- location of 50% for cpu, 25% for vo and 25% for l3 blocks. the worst case arbitration for vo is then: cpu l3 cpu vo, cpu l3 cpu vo to which the refresh (k), internal delays (t) and e for the first cpu request need to be added. the first vo request will re quire 129 sdram cycles (d vo = 5 or from the worst case pattern 19 + 10 + 20 + 4 * 20). the arbitration pattern shows that the following request will require (in the worst case) an extra 4 * 20 sdram cy- cles. thus vo clock spee d cannot be greater than 61.24% (128 / [129 + 80]) of the sdram clock speed. by changing the settings to 33% for the cpu, 33% for vo and 33% for l3 blocks (i.e. cpu weight = ?1?, l2 weight = ?2?, vo weight = ?1?, l3 weight = 1), the new sdram/vo clock percentage becomes 75.74% (128 / [109 + 60]) corre- sponding to a worst case arbitration pattern of cpu l3 vo, cpu l3 vo. before changing the settings the minimum sdram speed required to run vo at 74.25 mhz (high definition speed) was 122 mhz. after the new allocation 100 mhz is fine. note that here d vo remains equal to ?5?. e 5 vi weight l5 weight + l5 weight ----------------------------------------------- - e 4 = e 6 pci weight l6 weight + l6 weight --------------------------------------------------- - e 5 =
pnx1300/01/02/11 data book philips semiconductors 20-8 preliminary specification when vo is running in image mode 4:2:2 or 4:2:0 without upscaling and overlay enabled, the requirements be- come: ? during the first 64 vo clock cycles at least one request must be acked (the ol (overlay) data). ? during 128 vo clock cycles , vo block requires that 4 requests be acked ([4 ols, two ys one v and one u]/2). if the settings are 33% for the cpu, 33% for vo and 33% for l3 blocks then the worst case arbitration pattern is cpu l3 vo, cpu l3 vo, etc. the first requirement limits the vo/sdram ratio to (64 / [19 + 10 + 20 + 3 * 20]) = 58.7%. the second requirement gives a vo/sdram ratio of 44.29% (128 / [19 + 10 + 20 + 3 * 20 + 3 * 20 * 3]). thus if vo clock speed is supposed to be 54 mhz (pro- gressive scan) the sdram must run at least at 122 mhz. by setting the arbiter to 25% for the cpu, 37.5% for vo and 37.5% for vi (cpu weight = 1, l2 weight = 3, vo weight = 1, l3 weight = 1, assuming only vo and vi are enabled) the arbitration pattern becomes cpu vi vo vi cpu vo vi vo cpu vi vo. now both vi and vo are able to catch one request out of two, thanks to the read / write overlap. this leads to a vo/sdram ratio of 47.5% or a 113 mhz sdram. 20.6.3 raising priority if vo is running at 27 mhz (ntsc or pal) without over- lay and cpu weight is set to ?3? while all the other weights are set to ?1?, then the worst case latency derived from 20.5.1 for vo is: l vo,sc = (ceil[(1 + 1) / 1] + ceil[ (3 + 1) / 1] + 1) * 20 + 10 + 19 = 169 sdram cycles (assumes r vo = ?0?). the latency for vo is 1 request in 64 vo clock cycles. if sdram is running at 80 mhz, then the maximum latency tolerated by vo is floor(64 / (27 / 80)) = 189 sdram cy- cles. this means that vo requests can remain at low priority for 189 - 169 = 20 sdram cycles. if the cpu clock speed is 100 mhz (ratio is 5 / 4) then the arb_raise register can be programmed to: floor(20 * (5 / 4) / 16) = 1. vo requests will stay at low pr iority for 16 cycles allowing slightly better average cpu performance. 20.6.4 conclusion there is no obvious way to set the best weights for laten- cy or bandwidth allocation since the behavior of each block cannot be easily described with equations. practi- cal results obtained by running applications showed that once the arbiter is weighted to meet latencies the re- maining weight settings do not allow much improvement. the best way to tune the weights is by experiment, run- ning the application. the only accurate computation is the maximum worst case latency, which ensures that the hard real-time units work properly. this computation gives an upper bound and can be too pessimistic - but it still gives the right or- der of magnitude. refer to table 20-5 for the recom- mended allocation method. table 20-5. recommended allocation method video in allocate required latency video out allocate required latency audio in allocate required latency audio out allocate required latency spdif out allocate required latency icp allocate bandwidth pci allocate bandwidth vld allocate bandwidth/latency dvdd allocate bandwidth/latency
preliminary specification 21-1 power management chapter 21 by eino jacobs and hani salloum 21.1 overview in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 supports power management in two ways: ? in global power-down mode, most clocks on the chip are shut down and the sdram main memory is brought into low-power self-refresh mode. the power of all on-chip peripheral blocks except for bti (boot and i 2 c blocks), dcache, icache, pci, timers and vic blocks is shut off. some peripherals can be selectively prevented from participating in the global power down. ? a block power down mechanism allows power down of select peripheral blocks 21.2 entering and exiting global power down mode power management is software controlled and is initiat- ed by writing to the mmio register power_down. dur- ing execution of this mmio o peration, the system is pow- ered down without completing the mmio operation. when the system wakes up from power down mode, the mmio operation is completed. this means that during program execution on the dspcpu the moment of powe r down is defined exactly: any instruction before the instruction that contains the mmio operation is completed before entering power down mode. the instruction containing the mmio opera- tion and all subsequent instructions are completed after wake up from power down mode. wake-up from power down mode is effected by receiving an interrupt (any interrupt) that passes the acceptance criteria of the in terrupt controller. there is also wake-up from power down if a peripheral unit asserts a memory request signal on the highway. during power down mode the whole chip is powered down, except the plls, the interrupt logic, the timers, the wake-up logic in the mmi, and any logic in the peripheral units and pci bus interface that is not participating in the power down. note: writing to the global power_down register (at offset 0x100108) has no effect on the contents of the block_power_down regist er (at offset 0x103428), and vice versa. 21.3 effect of global power down on peripherals the on-chip peripheral units participate in global power down. this can be a programmable option for selected peripherals. these selected peripherals have a program- mable mmio control bit, t he sleepless bit, that can be used to prevent it from pa rticipating in the global power down mode. by default every pe ripheral unit must partic- ipate in power down. the following peripheral un its have the sleepless bit: video in, video out, audio in, audio out, spdo, ssi, and jtag. the following peripherals do not have the sleepless bit and always participate in power down: vld, boot/i 2 c and icp. the following peripherals do not participate in global power down, although they must power themselves down when they are inactive: vic, pci. when a peripheral does not participate in global power down, it can still do regular ma in memory traffic. every time a peripheral unit asserts the highway request signal, the mmi will initiate a wake -up sequence. the cpu must execute software that initia tes a new power down of the system. this software can be the wait-loop of the rtos. programmer?s note : since the system is awakened each time there is a transaction on the highway, it may be in- teresting to make a software loop that does the activation of the power_down mode. th en the activation is con- ditional and most of the ti me done using a global vari- able, usually set by a handler. it then becomes mandato- ry to be sure that there are no interruptible jumps between the time the value of the global variable is fetched and compared by the dspcu and the time the conditional write to the mmio is performed (it is the clas- sical semaphore or test and set issue). thus it is recom- mended that a separate function be used with the ad- dress of the variable as a parameter. this function needs then to be compiled specific ally without interruptible jumps. the wake-up from power down mode takes approxi- mately 20 sdram clock cycles. this amount of time is added to the worst case latency for memory requests compared to the situation when the system is not in pow- er down mode.
pnx1300/01/02/11 data book philips semiconductors 21-2 preliminary specification 21.4 detailed sequence of events for global power down the sequence of events to power down pnx1300 is as follows: ? issue a mmio write to the power_down register ? the main memory interfac e (mmi) waits till the com- pletion of the current sdram transfer, if there is one still busy. ? the mmi brings sdram into the self refresh state, goes into a wait state, and asserts the global signal global_power_down. ? all units that participate in the power down, respond to the global_power_down signal by disabling their clocks. ? only the pll, interrupt controller, timers, wake-up logic, the pci bus interface, and any peripherals that have their sleepless bit control bit set continue to be clocked. the sdram clock continues. ? an interrupt is detected by the interrupt controller or a unit that didn?t participat e in the power down requests a memory transfer. ? the mmi de-asserts the global_power_down signal, activating all blocks on the chip. ? the mmi recovers sdram from self-refresh. ? the mmi causes completi on of the mmio operation that initiated the power down sequence. ? when software takes an interruptible branch opera- tion, the interrupt that caused the wake-up will be serviced (if the wake-up was initiated by an interrupt). 21.5 mmio register power_down the register power_down has an offset 0x100108 in the mmio aperture and has no content. writing to this register has the side-effect of powering down the chip. reading from this register returns an undefined value and has no side-effect. 21.6 block power down this feature is new in pnx1 300. it selectively shuts off a particular block or a set of blocks based on software pro- gramming. this type of power down can be used in applications where certain blocks will never participate in the opera- tion of the chip. the objective of having this type of power down is saving on power consumption. each peripheral unit which c an participate in the global power down can be selectively powered down. this is done by setting a control bit in mmio register block_power_down specific ally for the block. the block_power_down register is located at mmio offset 0x103428. see figure 21-1 below. setting a particular bit to ?1? in this register has the effect of shutting off the correspond ing block. writing ?0? to this bit, enables the power for the block again. a block should not be powered down if it is active. enable bit should be set to ?0? before deciding to power down the block. note: the unassigned bits of this register have to be writ- ten to ?0? and read as ?0?. note: writing to the global power_down register (at offset 0x100108) has no effect on the contents of the block_power_down register (at offset 0x103428), and vice versa. figure 21-1. power down register block_power_down spdo dvdd ao ai evo vi 31 0 3 19 23 27 ssi vld 11 15 block_power_down (r/w) mmio_base offset: 0x10 3428 icp
preliminary specification 22-1 pci-xio external i/o bus chapter 22 by david wyland 22.1 summary functionality in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. the pnx1300 pci-xio bus allows glueless connection to pci peripherals, 8-bit microprocessor peripherals and 8-bit memory devices. all these device types can be in- termixed in a single pnx1300 system. the pci-xio bus provides the following features: ? all pci 2.1 features (32-bit, 33 mhz) ? simple, non-multiplexed, 8-bit data, 24-bit address xio bus with control signals for 68k and x86 style devices ? glueless connection to rom, eprom, flash eeprom, uarts, sram, etc. ? programmable internal or external bus clock source ? 0-7 programmable wait states for xio devices ? support for single byte read, single byte write, dma read or dma write ? the 16 mb of xio device space is visible as 16 mwords (64 mbytes) in the dspcpu memory map 22.1.1 description the xio logic that implements the protocol for 8-bit de- vices appears as a on-chip pc i target device to the rest of the pnx1300. it only responds when it is addressed by the pnx1300 as initiator and never responds to external pci masters. when it is addressed by the pnx1300 as an initiator, it responds to the pnx1300 pci biu as a nor- mal slave device, acti vating pci_devsel#. the xio logic serves as a bridge between the pci bus and xio devices such as roms, flash eproms and i/o device chips. the pnx13 00 addresses xio devices on the pci-xio bus in the same way as registers or memory in any other pci slave device. the xio logic supplies the pci_trdy# signals to the pci bus and also supplies the chip-select, read, write and data-strobe signals to xio devices attached to the pci-xio bus. a conceptual only block diagram of the pci-xio bus is shown in figure 22-2 . the real hardware uses the pci_ad[0:30] signals and pci_c/be#[0:3] signals for both pci and xio devices, as shown in figure 22-3 . the xio logic is activated when the enable bit in the xio_ctl register is asserted and whenever the pnx1300 (as initiator) addresses the pci-xio bus ad- dress range, as defined by a 6-bit address field in the xio bus control register. this 6-bit field defines the 6 most significant bits of the xio bus address space. when the pnx1300 sends out an address as an initiator, the upper 6 bits of the address are compared with this field. if they match, the pci-xio bus lo gic is activated. the pci_intb# output is asserted to indicate that the pci- xio bus is active. it becomes active at pci data phase time. when xio is enabled, the pci_intb# signal be- comes dedicated as xio bus chip-select, and turns from an open-drain output into a normal logic output. pci_intb# serves as a global chip select for all xio bus chips. when xio is disabled, pci_intb# is available for pci-specific use or as a general purpose software i/o pin with open-drain behavior as in tm-1000. the address field bits in the xio bus control register serve as a base address register in pci terms. the xio bus control register is not a pci configuration register. it does not need to be a pci configuration register because the pci-xio bus can only be addressed by the pnx1300. it will not respond to requests by any other ex- ternal pci device. when the xio-pci bus controller logic is activated, it generates pci_devsel# as a response to the pci bus. when pci_irdy# has been received from the biu, it as- serts an external pci_intb# signal as the global chip se- lect. it also reconfigures th e pci address/data pins for 8- bit byte transfers. when the pci-xio bus is active, the lower 24 bits of the external 32-bit pci bus are used to output a 24-bit address for all transfers, read or write. the upper 8 bits of the external pci bus are unchanged and transfer data normally. this is shown in figure 22-3 . the 24-bit address on the xio bus pins is the word ad- dress for the pci transfer, which is the lower 26 bits of the pci transfer address with the two least significant bits ignored. one word is transferred to or from the pci bus for each byte read or written on the xio bus. in writes to the xio bus, a 32-bit word is transferred from the pci biu to the xio bus controller, but the lower 24 bits and the pci byte enables are ignored. in reads from the pci bus, a 32-bit word is transferred from the xio bus con- troller to the pci biu with the data in the upper 8 bits and the 24-bit address in the lower 24 bits. note that the 24- bit address returned in a read is the lower 26 bits of the pci transfer address with the two least significant bits truncated. for example, a pci transfer address of 44 hexadecimal would return a value of 11 hexadecimal as the lower 24 bits of the 32-bit data in a read. the 24-bit xio bus address is generated by an address counter in the xio bus controller. this counter is loaded with the pci word address at pci frame time at the start of the
pnx1300/01/02/11 data book philips semiconductors 22-2 preliminary specification pci transfer and is incremented for each pci word trans- ferred. the xio bus does not generate parity during xio bus write transfers or check parity during xio bus read trans- fers. this allows the xio bus to interface to standard 8- bit devices without having to add parity-generation and check logic. while the xio bu s is active, the xio bus log- ic inhibits parity checking and drives the pci parity and parity error pins so that they do not float. word transfer is used to transfer the bytes to and from the pci bus for hardware simplicity. the primary intend- ed use of the pci-xio bus is for slow devices, roms, flash eproms and i/o. because the pci-xio bus is so much slower than the pnx1300, there is time available for the pnx1300 to pack and unpack the words. in the case of roms and flash epr oms, the data is typically compressed, requiring the pnx1300 cpu to both un- pack and decompress the data. the pci-xio bus co ntroller logic reconfigures the byte enables as control signals fo r the attached xio bus chips during xio bus transfers. it also drives the pci_trdy# signal to the pci bus for each transfer. the pci bus byte enables are reconfigured to generate xio bus timing sig- nals: read (iord), write (iowr) and data strobe (ds). these signals allow rom, flash eprom, 68k and x86 devices to be gluelessly interf aced to the xio bus. for a single device, the pci_intb# line is used as the global audio in audio out dspcpu 400 mips 2.5 gops i$ d$ i 2 c interface image co processor pnx1300 mmi pci and external i/o (p ci-xio) bus interface vld assist video out digital dmsd or raw video serial digital audio jtag xio bus pci - xio bus ad[31:0] sdram: 32-bit data sdram highway synchronous video in glueless flash eprom i/f xio i/o device pci i/o device clock camera i 2 c bus ccir 601 digital video out v.34 modem controls pci bus controls serial i/f figure 22-1. partial pnx1300 chip block diagram
philips semiconductors pci-xio external i/o bus preliminary specification 22-3 chip enable. if more than one device is to be added, an external decoder, such as a 74fct138, can be used to decode the upper bits of the 24-bit transfer address, with the pci_intb# line used as a global chip enable to the decoder. the pci-xio bus controller has a wait state generator to provide timing for slow device s. the wait state generator allows the addition of up to 7 wait states for slow chip ac- cess and write times. the wa it state generator logic gen- erates the pci_trdy# signal to the pci bus. the xio bus controller contai ns a clock generator for standalone systems. the pci-xio bus uses the pci clock. this clock is norma lly supplied by a pci bus cen- tral resource outside the pnx1300 chip. in standalone or low-cost systems, the inte rnal clock generator can be used. the internal clock generator divides the pnx1300 highway clock by a 5-bit number in a prescaler. this al- lows setting bus clocks from 4 mhz to 66 mhz in a 133 mhz system. the internal clock generator programming is described in section 22.5, ?xio_ctl mmio register.? 22.2 block diagram figure 22-2 shows a conceptual block diagram of the pci-xio bus as a slave device on the pci bus. the xio bus controller generates an xio bus, which is an 8-bit bus with a 24-bit address. de vices attached to the xio bus appear as memory locations in the 16 mb address space of the xio bus. figure 22-3 shows an implementation block diagram of the pci_xio bus. to conserve pins, the xio bus con- troller uses the pci i/o pins as xio bus pins during xio bus data transfers. it reconf igures the 32 pci address/ data pins as 8 xio bus data pins and 24 xio bus ad- dress pins, and it reconfigures the byte enable pins as xio bus timing signals. by changing the functions of the pins during the transfer, 36 pins are saved which would otherwise be required to dr ive the xio bus devices. by reconfiguring the pci pins only during the data phase of the xio bus transfers, the pci-xio bus retains its pci bus compatibility. figure 22-4 shows a more detailed block diagram of the pci-xio bus controller. pnx1300 sdram data highway pci bus interface unit (biu) pci bus xio bus controller pci device pci device pci device pci host rom x86 device pnx1300 8-bit data + 24-bit addresses xio bus figure 22-2. pci-xio bus device conceptual block diagram for address & data, these use the same pins/wires
pnx1300/01/02/11 data book philips semiconductors 22-4 preliminary specification pnx1300 sdram data highway pci bus interface unit (biu) pci bus xio bus controller pci device pci device pci device pci host rom x86 device etc. pnx1300 mux pci_intb# pci_intb# = xio bus active as target pci_ad[23:0] pci_ad[31:24] pci_ad[31:24] pci_ad[31:0] pci_ad[31:0] pci_ad[31:0] xio bus figure 22-3. pci-xio bus device implementation block diagram pnx1300 sdram data highway xio config reg clock bus timing pci bus interface pci_ad[31:24] pci_c/be0#: iord# pci_clk pci-xio bus controller unit (biu) = mux data out [31:24] data in [31:0] data out [23:0] address [23:0] pci_ad[23:00] address [31:24] pci_inta#, intc#, intd# pci_c/be1#: iowr# c/be trdy xio controls + wait states pci_intb# = chip enable pci controls: frame, etc. pci_trdy# pci_devsel# or or devsel pci_req# pci_gnt# tie req to gnt for stand alone (no host) case pnx1300 initiator pci_c/be2#: ds# pci_c/be3# figure 22-4. pci-xio bus interface controller block diagram pci-xio bus
philips semiconductors pci-xio external i/o bus preliminary specification 22-5 22.3 data formats the data transfer formats for the pci-xio bus are shown in figure 22-5 . the 8-bit data field is the data transferred to or from the pci-xio bus. the read address is the 24- bit address on the pci-xio bus address lines when the read transfer takes place. 22.4 interface 22.4.1 pci-xio bus interface design the pci-xio bus can accommo date a variety of different devices and bus protocols. the following are examples of devices interfaced to the pci-xio bus. data read address unused data read: xio bus to pci write: pci to xio bus 31 24 23 0 31 24 23 0 figure 22-5. pci-xio bus data formats table 22-1. pci-xio bu s signal definitions pnx1300 pci signal pins i/o pci function xio function pci_intb# 1 o pci-xio bus enable = xio bus active as target device pci_ad[23:0] 24 i/o pci addr ess/data address bus: 16 mb pci_ad[31:24] 8 i/o data bus: 8 bits pci_par 1 o even parity for ad & c/be pci_c/be0# 1 command/byte enables on xio read, be[3:0] = 0110b?4 on xio write, be[3:0] = 0111b?4 iord# = read enable pci_c/be1# 1 iowr# = write enable pci_c/be2# 1 ds# = data strobe pci_c/be3# 1 unused pci_clk 1 i/o 33 mhz pci clock: can optionally be generated by pnx1300 on board osc pci_frame# 1 i/o pci address/command strobe + transfer in progress pci_devsel# 1 i/o device select valid asserted by pnx1300 = xio active pci_irdy# 1 i/o initiator ready = transfer in progress pci_trdy# 1 i/o target ready asserted by pnx1300 = xio transfer timing pci_stop# 1 i/o target requests stop of transaction pci_idsel# 1 i chip select for pci config writes pci_req# 1 o pnx1300 requesting pci bus pci_gnt# 1 i pnx1300 is granted pci bus pci_perr# 1 i parity error to pnx1300 pci_serr 1 o system error from pnx1300 pci_inta# 1 i/o general purpose i/o pci_intb# 1 i/o general purpose i/o xio bus active = global chip select pci_intc# 1 i/o general purpose i/o pci_intd# 1 i/o general purpose i/o
pnx1300/01/02/11 data book philips semiconductors 22-6 preliminary specification 22.4.1.1 flash eeprom figure 22-6 shows an 8-bit flash eeprom interfaced to the pci-xio bus. examples of these devices are the mi- cron mt28f200c1 and the amd 29lv400. 22.4.1.2 68k bus i/o device figure 22-7 shows a 68k bus i/o device interfaced to the pci-xio bus. example de vices are the motorola mc68hc681 duart and the mc68hc901 multi-func- tion peripheral. 22.4.1.3 x86/isa bus i/o device figure 22-8 shows an x86 or isa bus i/o device inter- faced to the pci-xio bus. an example device is the intel 82091 advanced integrated peripheral (aip). 22.4.1.4 multiple flash eeprom figure 22-9 shows two 8-bit fl ash eeproms interfaced to the pci-xio bus. a 74fct 138 logic chip decodes up- per bits pci_ad[19-17] of t he xio bus address to gener- ate the chip selects for the two eeproms. these bits decode the address space into blocks of 128 kb. the ad- dress range of each enable is shown on the enable lines. six spare chip selects are available for attaching up to six more eeproms or to attach other devices. the 74fct138 provides both decode of the address bits and the and function for the pci_intb# global chip enable address pci_ad[16:0] write enable pci_c/be1#: iowr# output enable pci_c/be0#: iord# chip select pci_intb# data pci_ad[31:24] 128kx8 eeprom figure 22-6. 8-bit flash eeprom interface address pci_ad[23:0] r/w# pci_c/be1#: iowr# ds# pci_c/be2: ds# chip select pci_intb# data pci_ad[31:24] 68k bus device clk pci_clk figure 22-7. 8-bit 68k bus device interface address pci_ad[23:0] i/o read enable pci_c/be0#: iord# i/o write enable pci_c/be1#: iowr# chip select pci_intb# data pci_ad[31:24] x86 or isa bus device bale pci_clk figure 22-8. 8-bit x86 / isa bus device interface
philips semiconductors pci-xio external i/o bus preliminary specification 22-7 signal so that only one eeprom chip enable signal is active at global chip enable time. 22.5 xio_ctl mmio register the pci-xio bus controller has one programmer visible mmio register: xio_ctl. its format is shown in table 22-2 . to ensure compatibility with future devices, any undefined mmio bits should be ignored when read, and written as ?0?s. 22.5.1 pci_clk bus clock frequency pci_clk, the clock for the pci and pci-xio bus can be supplied externally or intern ally. this is determined at boot time, by the ?enable in ternal pci_clk generator? bit, bit 6 of byte 9 in t he boot eeprom. refer to section 13.2 on page 13-2 . if this bit = ?0?, pci_clk acts compatible with tm-1000 and normal pci operation, i.e. pci_clk is an input pin that takes the pci clock from the external world. if this bit = ?1?, an on-chip clock divider in the xio logic becomes the source of pci_clk, and the pci_clk pin is configured as an output. in the latter case, the pci_clk frequency can be programmed to a divider of the pnx1300 highway clock by setting the xio_ctl reg- ister ?clock frequency ? divider value. table 22-2. xio_ctl register fields: mmio address 0x10 3060 field bits function reset value address 31:26 xio address space undefined 25:11 unused 0 wait states 10:8 wait states 0 enable 7 enable xio bus opera- tion 0 = disabled 6:5 unused clock fre- quency 4:0 clock divider 0x1f address pci_ad[16:0] write enable pci_c/be1#: iowr# output enable pci_c/be0#: iord# chip select pci_intb# data pci_ad[31:24] 128kx8 eeprom address write enable output enable chip select data 74fct138 a[2-0] o0 o1 o2 o3 o4 o5 o6 o7 e0 e1 e2 +3 pci_ad[19-17] 0-128k 128-256k 256-384k 384-512k 512-640k 640-768k 768-896k 896-1024k 128kx8 eeprom figure 22-9. multiple 8-bit flash eeprom interface table 22-3. pci_clk frequencies for 133.0 mhz pnx1300 highway clock clock frequency (use odd values) pnx1300 clocks pci-xio clock period, ns frequency, mhz 0 illegal illegal illegal 1 2 15 66.5 2 3 22.5 44.33 3 4 30 33.25 ... ... ... ... 30 31 233 4.29 31 32 241 4.16
pnx1300/01/02/11 data book philips semiconductors 22-8 preliminary specification a table of pci-xio bus clock frequencies versus clock field values is shown in table 22-3 . note that the pci_clk operating frequency should be set to observe the frequency limits given in the ac/dc timing character- ization data for pnx1300. od d values of ?clock frequen- cy? are recommended, resulting in an even divider, which generates a 50% duty cycle pci_clk. 22.5.2 wait state generator the xio bus controller has an automatic wait state gen- erator to allow for read and write cycle times of devices on the xio bus. 22.6 pci-xio bus timing the timing for the pci-xio bu s is shown below: note that the ?fat? lines indicate active drive by pnx1300. thin lines indicate areas where the pnx1300 is not actively driving. (in these areas, pull-up resistors retain the signal high for control signals, pci_ad lines are left floating.) figure 22-10 shows the timing for a single byte read transfer. figure 22-11 shows the timing for a single byte read transfer with wait states. figure 22-14 shows the timing for a dma burst read transfer of 2 bytes, and figure 22-16 shows the timing for a dma burst write transfer of 2 bytes. the dma burst transfers are shown at maximum rate, with zero wait states. dma burst trans- fers with wait states insert wait states between the trans- fers. in the read case, the iord# enable and ds# are ex- tended by the wait states. in the write case, the iowr# enable and ds# are delay ed by the wait states. table 22-4. wait state generator codes code wait states 00 11 22 ... ... 77 pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# frame time bus turnaround xio transfer figure 22-10. pci-xio bus timing: single byte read, 0 wait states & address setup pci_ad[23:0]: addr xio addrs pci address pci_ad[31:24]: data read data pci address pci_intb#/ce# pci_c/be2#/ds# pci command pci_c/be1#/iowr# pci command pci_c/be0#/iord# pci command read sample point bus idle
philips semiconductors pci-xio external i/o bus preliminary specification 22-9 pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# frame time bus turnaround wait (k times) figure 22-11. pci-xio bus timing: single byte read, 1 or more wait states & address setup pci_ad[23:0]: addr xio addrs pci address pci_ad[31:24]: data read data pci address pci_intb#/ce# pci_c/be2#/ds# pci command pci_c/be1#/iowr# pci command pci_c/be0#/iord# pci command read sample point xio transfer pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# frame time write cycle data hold time figure 22-12. pci-xio bus timing: si ngle byte write, 0 wait states pci_ad[23:0]: addr xio addrs pci address pci_ad[31:24]: data pci address pci_intb#/ce# pci_c/be2#/ds# pci command pci_c/be1#/iowr# pci command pci_c/be0#/iord# pci command bus idle xio data
pnx1300/01/02/11 data book philips semiconductors 22-10 preliminary specification pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# frame time figure 22-13. pci-xio bus timing: single byte write, 1 or more wait states write cycle pci_ad[23:0]: addr xio addrs pci address pci_ad[31:24]: data pci address pci_intb#/ce# pci_c/be2#/ds# pci command pci_c/be1#/iowr# pci command pci_c/be0#/iord# pci command data hold time xio data wait (k) bus idle pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# frame time bus turnaround xio data 1 figure 22-14. pci-xio bus timing: dma burst read, 2 bytes, 0 wait states & address setup pci_ad[23:0]: addr xio addrs 1 pci address pci_ad[31:24]: data read data 2 pci address pci_intb#/ce# pci_c/be2#/ds# pci command pci_c/be1#/iowr# pci command pci_c/be0#/iord# pci command read sample points xio data 2 bus idle xio addrs 2 read data 1
philips semiconductors pci-xio external i/o bus preliminary speci fication 22-11 pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# figure 22-15. pci-xio bus timing: dma burst read, 2 bytes, 1 or more wait states pci_ad[23:0]: addr xio addrs 1 pci addr pci_ad[31:24]: data pci addr pci_intb#/ce# pci_c/be2#/ds# pci com pci_c/be1#/iowr# pci com pci_c/be0#/iord# pci com read sample points read data 1 wait(k) data 1 wait(k) data 2 xio addrs 2 read data 2 frame turn pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# figure 22-16. pci-xio bus timing: dma burst write, 2 bytes, 1 or more wait states pci_ad[23:0]: addr pci addr pci_ad[31:24]: data pci addr pci_intb#/ce# pci_c/be2#/ds# pci com pci_c/be1#/iowr# pci com pci_c/be0#/iord# pci com wait(k) hold data2 wait(k) xio addrs 1 frame data1 xio addrs 2 hold idle xio data1 xio data 2
pnx1300/01/02/11 data book philips semiconductors 22-12 preliminary specification 22.7 pci-xio bus controller operation and programming the pci-xio bus is a pci target device. all valid pci transfers with pnx1300 as the initiator are allowed, in- cluding single word and dma transfers. when data is read from the pci-xio bus, it reads as a 32-bit word with the 8 bits of data as the most significant byte and the 24- bit xio bus transfer addres s as the least significant bytes. when data is written to the pci-xio bus, it is writ- ten as a word, but only the mo st significant byte of the data is transferred to the bus. the lower 24 bits are ig- nored as they are replaced by the lower 24 bits of the transfer address before being placed on the bus. before the pci-xio bus can be used, the pci-xio bus control register must be set up. this register must be loaded with the base address for the pci-xio bus and the control fields for clock frequency, wait states per transfer and pci-xio bus enable. to read a single byte to a pci-xio bus device, first de- fine the 24-bit address for th e device. this might be the address in an eprom for the desired byte. multiply this device address by four to convert it to a word address and add the xio bus base address. the combined ad- dress is the pci transfer address. use this address as the transfer address for a single word dspcpu load. table 22-5 shows examples of th is address conversion. at the completion of the l oad, the data received will con- sist of 8 bits of data and the 24-bit device address. to write a byte, use the same transfer address and write a word to this address with the desired data as the most significant byte of the word written. to transfer data between the xio-pci bus and the sdram using the pci dma capability, set the src_adr or the dest_adr register to the pci-xio bus transfer address, depending on the direction of the transfer. the pci-xio bus transfer address is four times the starting address as seen on the pci-xio bus ad- dress pins plus the pci-xio bus controller base address. this is the starting address for the pci-xio bus transfer. set the other address, destination or source, to the de- sired starting address in sdram. set the pci_dma_ctl register for the desired direction and set the transfer count to the four times number of pci-xio bus bytes to be transferred. the transfer count is four times the pci-xio bus bytes to be transferred because the pci-xio bus transfers one word to or from the pci bus for each byte transferred to or from devices on the pci-xio bus. word transfer is used to transfer the bytes to and from the pci bus for hardware simp licity. additional hardware could be added to pack and u npack bytes, but this is an unnecessary complication given the speed of the pci- xio bus relative to the speed of the pnx1300 bus and cpu. the primary intended use of the pci-xio bus is for roms, flash eproms and i/o devices. because the pci-xio bus is so much slower than the pnx1300, there pci_clk pci_frame# pci_irdy# pci_trdy# pci_devsel# figure 22-17. pci-xio bus timing: dma burst write, 2 bytes, 0 wait states pci_ad[23:0]: addr pci addr pci_ad[31:24]: data pci addr pci_intb#/ce# pci_c/be2#/ds# pci com pci_c/be1#/iowr# pci com pci_c/be0#/iord# pci com hold data 2 hold bus idle xio addrs 1 frame data1 xio addrs 2 xio data 1 xio data 2 table 22-5. pci to xio bus address conversion examples xio bus address in hex pci word address in hex xio-pci base address in hex pci transfer address in hex 11 44 5800 0000 5800 0044 0123 048c 5800 0000 5800 048c 11 0012 44 0048 5800 0000 5844 0048
philips semiconductors pci-xio external i/o bus preliminary speci fication 22-13 is time available for the pnx1300 to pack and unpack the words. at three pci-xio bus wait states, at least 120 nanoseconds are required for each byte transferred. this corresponds to 12 cpu instructions at 100 mhz. the cpu may need to process each byte of data anyway. in the case of roms and flash eproms, the data is typical- ly compressed, requiring the pnx1300 cpu to both un- pack and decompress the data.
pnx1300/01/02/11 data book philips semiconductors 22-14 preliminary specification
preliminary specification a-1 pnx1300/01/02/11 dspcpu operations appendix a by gert slavenburg, marcel janssens a.1 alphabetic operation list the following table lists the complete operation set of pnx1300 ?s dspcpu. note that this is not an instruction list; a dspcpu instruction contains from one to five of these operations. a alloc ............................4 allocd ..........................5 allocr...........................6 allocx ..........................7 asl...............................8 asli ..............................9 asr ............................10 asri............................ 11 b bitand........................12 bitandinv ....... ............13 bitinv .........................14 bitor ..........................15 bitxor.........................16 borrow ......................17 c carry .........................18 curcycles ...... ............19 cycles ...... .......... .......20 d dcb............................21 dinvalid .....................22 dspiabs .....................23 dspiadd.....................24 dspidualabs ..............25 dspidualadd ..............26 dspidualmul ..............27 dspidualsub ..............28 dspimul .....................29 dspisub .....................30 dspuadd........ ............31 dspumul....................32 dspuquadaddui.........33 dspusub....................34 dualasr......................35 dualiclipi....................36 dualuclipi ..................37 f fabsval ......................38 fabsvalflags ..............39 fadd ..........................40 faddflags....... ............41 fdiv............................42 fdivflags ....................43 feql............................44 feqlflags ....................45 fgeq ..........................46 fgeqflags....... ............47 fgtr ............................48 fgtrflags.....................49 fleq............................50 fleqflags ....................51 fles............................52 flesflags ....................53 fmul...........................54 fmulflags ...................55 fneq ..........................56 fneqflags...... .............57 fsign..........................58 fsignflags ..................59 fsqrt ..........................60 fsqrtflags...................61 fsub...........................62 fsubflags ...................63 funshift1....................64 funshift2....................65 funshift3....................66 h h_dspiabs .................67 h_dspidualabs ..........68 h_iabs.......................69 h_st16d.....................70 h_st32d.....................71 h_st8d.......................72 hicycles........ ........ .....73 i iabs...........................74 iadd...........................75 iaddi........... .......... .....76 iavgonep...... .............77 ibytesel .....................78 iclipi ..........................79 iclr.............................80 ident........... .......... .....81 ieql............................82 ieqli ...........................83 ifir16..........................84 ifir8ii ..........................85 ifir8ui.........................86 ifixieee ......................87 ifixieeeflags.. .............88 ifixrz ..........................89 ifixrzflags ..................90 iflip ............................91 ifloat..........................92 ifloatflags ..................93 ifloatrz.......................94 ifloatrzflags ...............95 igeq...........................96 igeqi........... .......... .....97 igtr ............................98 igtri............................99 iimm........................100 ijmpf........................101 ijmpi ........................102 ijmpt........................103 ild16........................104 ild16d........... ...........105 ild16r............ ...........106 ild16x ........... ...........107 ild8..........................108 ild8d........................109 ild8r......................... 110 ileq.......................... 111 ileqi ......................... 112 iles .......................... 113 ilesi ......................... 114 imax........................ 115 imin......................... 116 imul......................... 117 imulm...................... 118 ineg......................... 119 ineq.........................120 ineqi............ ............121 inonzero........ ..........122 isub.........................123 isubi ........................124 izero........................125 j jmpf.........................126 jmpi.........................127 jmpt.........................128 l ld32.........................129 ld32d........... ............130 ld32r ........... ............131 ld32x........... ............132 lsl ............................133 lsli ...........................134 lsr............................135 lsri...........................136 m mergedual16lsb......137 mergelsb.................138 mergemsb ..............139 n nop .........................140 p pack16lsb ...............141 pack16msb .............142 packbytes ...............143 pref .........................144 pref16x ...................145 pref32x ...................146 prefd .......................147 prefr ........................148 q quadavg........ ..........149 quadumax...............150 quadumin...... ..........151 quadumulmsb.........152 r rdstatus...................153 rdtag .......................154 readdpc ..................155 readpcsw ................156 readspc...................157 rol ...........................158 roli...........................159 s sex16......................160 sex8........................161 st16.........................162 st16d.......................163 st32.........................164 st32d.......................165 st8...........................166 st8d.........................167 u ubytesel ..................168 uclipi .......................169 uclipu ......................170 ueql.........................171 ueqli........................172 ufir16 ......................173 ufir8uu ....................174 ufixieee ...................175 ufixieeeflags ...........176 ufixrz.......................177 ufixrzflags ...............178 ufloat.......................179 ufloatflags ...............180 ufloatrz....................181 ufloatrzflags ............182 ugeq .......................183 ugeqi.......................184 ugtr .........................185 ugtri ........................186 uimm.......................187 uld16.......................188 uld16d.......... ...........189 uld16r .......... ...........190 uld16x.......... ...........191 uld8.........................192 uld8d.......................193 uld8r .......................194 uleq.........................195 uleqi........................196 ules.........................197 ulesi ........................198 ume8ii.....................199 ume8uu ..................200 umin........................201 umul........................202 umulm.....................203 uneq .......................204 uneqi.......................205 w writedpc ..................206 writepcsw................207 writespc ..................208 z zex16......................209 zex8........................210
pnx1300/01/02/11 data book philips semiconductors a-2 preliminary specification a.2 operation list by function load/store operations alloc ............................4 allocd ..........................5 allocr...........................6 allocx ..........................7 h_st16d.....................70 h_st32d.....................71 h_st8d.......................72 ild16........................104 ild16d............ ..........105 ild16r........... ............106 ild16x ............ ..........107 ild8..........................108 ild8d........................109 ild8r......................... 110 ld32.........................129 ld32d............. ..........130 ld32r ........... ............131 ld32x........... ............132 pref .........................144 pref16x ...................145 pref32x ...................146 prefd .......................147 prefr ........................148 st16.........................162 st16d............. ..........163 st32.........................164 st32d............. ..........165 st8...........................166 st8d.........................167 uld16............. ..........188 uld16d........... ..........189 uld16r ........... ..........190 uld16x........... ..........191 uld8.........................192 uld8d............. ..........193 uld8r .......................194 shift operations asl...............................8 asli ..............................9 asr ............................10 asri............................ 11 funshift1....................64 funshift2....................65 funshift3....................66 lsl ............................133 lsli ...........................134 lsr............................135 lsri...........................136 rol ...........................158 roli...........................159 logical operations bitand........................12 bitandinv ....... ............13 bitinv .........................14 bitor ..........................15 bitxor.........................16 dsp operations dspiabs .....................23 dspiadd.....................24 dspidualabs ..............25 dspidualadd ..............26 dspidualmul ..............27 dspidualsub ..............28 dspimul .....................29 dspisub .....................30 dspuadd....... .............31 dspumul....................32 dspuquadaddui.........33 dspusub....................34 dualasr......................35 dualiclipi....................36 dualuclipi ..................37 h_dspiabs .................67 h_dspidualabs ..........68 iclipi ..........................79 ifir16..........................84 ifir8ii ..........................85 ifir8ui.........................86 iflip ............................91 imax........................ 115 imin......................... 116 quadavg....... ...........149 quadumax....... ........150 quadumin........ ........151 quadumulmsb.........152 uclipi .......................169 uclipu ......................170 ufir16 ......................173 ufir8uu ....................174 ume8ii.....................199 ume8uu ..................200 umin........................201 floating-point arithmetic fabsval ......................38 fabsvalflags ..............39 fadd ..........................40 faddflags...... .............41 fdiv............................42 fdivflags ....................43 fmul...........................54 fmulflags ...................55 fsign..........................58 fsignflags ..................59 fsqrt ..........................60 fsqrtflags...................61 fsub...........................62 fsubflags ...................63 floating-point conversion ifixieee ......................87 ifixieeeflags.. .............88 ifixrz ..........................89 ifixrzflags ..................90 ifloat..........................92 ifloatflags ..................93 ifloatrz.......................94 ifloatrzflags ...............95 ufixieee ...................175 ufixieeeflags ...........176 ufixrz.......................177 ufixrzflags ...............178 ufloat.......................179 ufloatflags ...............180 ufloatrz....................181 ufloatrzflags ............182 floating-point relationals feql............................44 feqlflags ....................45 fgeq ..........................46 fgeqflags....... ............47 fgtr ............................48 fgtrflags.....................49 fleq............................50 fleqflags ....................51 fles............................52 flesflags ....................53 fneq ..........................56 fneqflags....... ............57 integer arithmetic borrow ......................17 carry .........................18 h_iabs.......................69 iabs...........................74 iadd...........................75 iaddi............ .......... ....76 iavgonep....... ............77 ident............ .......... ....81 imul......................... 117 imulm...................... 118 ineg......................... 119 inonzero........ ..........122 isub.........................123 isubi ........................124 izero........................125 umul........................202 umulm.....................203 immediate operations iimm........................100 uimm.......................187 sign/zero extend ops sex16......................160 sex8........................161 zex16......................209 zex8........................210 integer relationals ieql............................82 ieqli ...........................83 igeq...........................96 igeqi............ .......... ....97 igtr ............................98 igtri............................99 ileq.......................... 111 ileqi ......................... 112 iles .......................... 113 ilesi ......................... 114 ineq.........................120 ineqi........... .............121 ueql.........................171 ueqli........................172 ugeq .......................183 ugeqi.......................184 ugtr .........................185 ugtri ........................186 uleq.........................195 uleqi........................196 ules.........................197 ulesi ........................198 uneq .......................204 uneqi.......................205 control-flow operations ijmpf........................101 ijmpi ........................102 ijmpt........................103 jmpf.........................126 jmpi.........................127 jmpt.........................128 special-register ops cycles .......... ........ .....20 curcycles ..... ........ .....19 hicycles........ ........ .....73 nop .........................140 readdpc ..................155 readpcsw ................156 readspc...................157 writedpc ..................206 writepcsw................207 writespc ..................208 cache operations dcb............................21 dinvalid .....................22 iclr.............................80 rdstatus...................153 rdtag .......................154 pack/merge/select ops ibytesel .....................78 mergedual16lsb......137 mergelsb.................138 mergemsb ..............139 pack16lsb ...............141 pack16msb .............142 packbytes ...............143 ubytesel ..................168
pnx1300/01/02/11 data book philips semiconductors a-3 preliminary specification
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-4 allocate a cache block pseudo-op for allocd(0) syntax [ if r guard ] alloc( d ) r src1 function if r guard then { cache_block_mask = ~(cache_block_size -1)] allocate adata cache block with [(rs rc1 + 0) & cache_block_mask] address } attributes function unit dmemspec operation code 213 number of operands 1 modifier - modifier range - latency - issue slots 5 description the alloc operation is a pseudo operation transformed by the scheduler into an allocd(0) with the same arguments. (note: pseudo operations cannot be used in assembly files.) the alloc operation allocate a cache block with the address computed from [(rsrc1 + 0) & cache_block_mask] and sets the status of this cache block as valid. no data is fetched from main memory for this operation. the allocated cache block data is undefined after this opera tion. it is the responsib ility of the programmer to up date the allocated cache block by store operations. refer to the ?cache architecture? sectio n for details on the cache block size. the alloc operation optionally takes a guard, specified in rguar d. if a guard is present, its lsb controls the execution of the alloc operation. if the lsb of rguard is 1, allo c operation is executed; otherwise, it is not executed. examples initial values operation result r10 = 0xabcd, cache_block_size = 0x40 alloc r10 allocates a cache block for the address space from 0xabc0 to 0x0xabff without fetching the data from main memory; the data in this address space is undefined. r10 = 0xabcd, r11 = 0, cache_block_size = 0x40 if r11 alloc r10 since guard is false, alloc operation is not executed r10 = 0xac0f, r11 = 1, cache_block_size = 0x40 if r11 alloc r10 allocates a cache block for the address space from 0xac00 to 0xac3f without fetching the data from main memory; the data in this address space is undefined. see also allocd allocr allocx alloc
pnx1300/01/02/11 data book philips semiconductors a-5 preliminary specification allocd allocate a cache block with displacement syntax [ if r guard ] allocd( d ) r src1 function if r guard then { cache_block_mask = ~(cache_block_size -1)] allocate adata cache block with [(rsrc1 + d) & cache_block_mask] address } attributes function unit dmemspec operation code 213 number of operands 1 modifier 7 bits modifier range -255..252 by 4 latency - issue slots 5 description the allocd operation allocate a cache block with the address computed from [(rsrc1 + d) & cache_block_mask] and sets the status of this cache block as valid. no data is fetched from main me mory for this operation. the allocated cache block data is undefined after this operation. it is the responsibility of the programmer to u pdate the allocated cache block by store operations. refer to the ?cache architecture? sectio n for details on the cache block size. the allocd operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the execution of the allocd op eration. if the lsb of r guard is 1, allocd operation is execut ed; otherwise, it is not executed. examples initial values operation result r10 = 0xabcd, cache_block_size = 0x40 allocd(0x32) r10 allocates a cache block for the address space from 0xabc0 to 0x0xabff without fetching the data from main memory; the data in this address space is undefined. r10 = 0xabcd, r11 = 0, cache_block_size = 0x40 if r11 allocd(0x32) r10 since guard is false, allocd operation is not executed r10 = 0xabff, r11 = 1, cache_block_size = 0x40 if r11 allocd(0x4) r10 allocates a cache block for the address space from 0xac00 to 0xac3f without fetching the data from main memory; the data in this address space is undefined. see also allocr allocx
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-6 allocate a cache block with index syntax [ if r guard ] allocr rsrc1 rsrc2 function if r guard then { cache_block_mask = ~(cache_block_size -1)] allocate adata cache block with [(rs rc1 + rsrc2) & cache_block_mask] address } attributes function unit dmemspec operation code 214 number of operands 2 modifier no modifier range - latency - issue slots 5 description the allocr operation allocate a cache block with the address computed from [(rsrc1 + rscr2) & cache_block_mask] and sets the status of this cache block as valid. no data is fetched from main memory for this operation. the allocated cache block data is undefined after this operation. it is the responsibility of the programmer to u pdate the allocated cache block by store operations. refer to the ?cache architecture? sectio n for details on the cache block size. the allocr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the execution of the allocr op eration. if the lsb of r guard is 1, allocr operation is execut ed; otherwise, it is not executed. examples initial values operation result r10 = 0xabcd, r12 = 0x32 cache_block_size = 0x40 allocr r10 r12 allocates a cache block for the address space from 0xabc0 to 0xabff without fetching the data from main memory; the data in this address space is undefined. r10 = 0xabcd, r11 = 0, r12=0x32, cache_block_size = 0x40 if r11 allocr r10 r12 since guard is false, allo cr operation is not executed r10 = 0xabff, r11 = 1, r12 =0x4, cache_block_size = 0x40 if r11 allocr r10 r12 allocates a cache block for the address space from 0xac00 to 0xac3f without fetching the data from main memory; the data in this address space is undefined. see also allocd allocx allocr
pnx1300/01/02/11 data book philips semiconductors a-7 preliminary specification allocx allocate a cache block with scaled index syntax [ if r guard ] allocx rsrc1 rsrc2 function if r guard then { cache_block_mask = ~(cache_block_size -1)] allocate adata cache blockwith [(rsrc1 + 4 x rsrc2) & cache_block_mask] address } attributes function unit dmemspec operation code 215 number of operands 2 modifier no modifier range - latency - issue slots 5 description the allocx operation allocate a cache block with the address computed from [(rsrc1 + 4 x rscr2) & cache_block_mask] and sets the status of this cache block as valid. no data is fetched from main memory for this operation. the allocated cache block data is undefined after this operation. it is the responsibility of the programmer to u pdate the allocated cache block by store operations. refer to the ?cache architecture? sectio n for details on the cache block size. the allocx operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the execution of the allocx operation. if the lsb of r guard is 1, allocx operation is execut ed; otherwise, it is not executed. examples initial values operation result r10 = 0xabcd, r12 = 0xc cache_block_size = 0x40 allocx r10 r12 allocates a cache block for the address space from 0xabc0 to 0x0xabff without fetching the data from main memory; the data in this address space is undefined. r10 = 0xabcd, r11 = 0, r12=0xc, cache_block_size = 0x40 if r11 allocx r10 r12 since guard is false, allocx operation is not executed r10 = 0xabff, r11 = 1, r12 =0x4, cache_block_size = 0x40 if r11 allocx r10 r12 allocates a cache block for the address space from 0xac00 to 0xac3f without fetching the data from main memory; the data in this address space is undefined. see also allocd allocr
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-8 arithmetic shift left syntax [ if r guard ] asl r src1 r src2 r dest function if r guard then { n r src2 <4:0> r dest <31:n> r src1 <31?n:0> r dest 0 if rsrc2<31:5> != 0 { rdest <- 0 } } attributes function unit shifter operation code 19 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the asl operation takes two arguments, r src1 and r src2 . r src2 specify an unsigned shift amount, and r dest is set to r src1 arithmetically shifted left by th is amount. if the rsrc2<31:5> value is not zero, then take this as a shift by 32 or more bits. zeros are shifted into the lsbs of r dest while the msbs shifted out of r src1 are lost. the asl operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r60 = 0x20, r30 = 3 asl r60 r30 r90 r90 0x100 r10 = 0, r60 = 0x20, r30 = 3 if r10 asl r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x20, r30 = 3 if r20 asl r60 r30 r110 r110 0x100 r70 = 0xfffffffc, r40 = 2 asl r70 r40 r120 r120 0xfffffff0 r80 = 0xe, r50 = 0xfffffffe asl r80 r50 r125 r125 0x00000000 (shift by more than 32) r30 = 0x7008000f, r60 = 0x20 asl r30 r60 r111 r111 0x00000000 r30 = 0x8008000f, r45 = 0x80000000 asl r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x23 asl r30 r45 r100 r100 0x00000000 0 31 r src1 31 r src2 0 0 0 left shifter 32 bits from r src1 0 31 r dest 3 0 0 0 intermediate result (example: n = 3) r src2 0 see also asli asr asri lsl lsli lsr lsri rol roli asl
pnx1300/01/02/11 data book philips semiconductors a-9 preliminary specification asli arithmetic shift left immediate syntax [ if r guard ] asli( n ) r src1 r dest function if r guard then { r dest <31: n > r src1 <31? n :0> r dest < n ?1:0> 0 } attributes function unit shifter operation code 11 number of operands 1 modifier 7 bits modifier range 0..31 latency 1 issue slots 1, 2 description as shown below, the asli operation takes a single argument in r src1 and an immediate modifier n and produces a result in r dest equal to r src1 arithmetically shifted left by n bits. the value of n must be between 0 and 31, inclusive. zeros are shifted into the lsbs of r dest while the msbs shifted out of r src1 are lost. the asli operations optionally take a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r60 = 0x20 asli(3) r60 r90 r90 0x100 r10 = 0, r60 = 0x20 if r10 asli(3) r60 r100 no change, since guard is false r20 = 1, r60 = 0x20 if r20 asli(3) r60 r110 r110 0x100 r70 = 0xfffffffc asli(2) r70 r120 r120 0xfffffff0 r80 = 0xe asli(30) r80 r125 r125 0x80000000 0 31 r src1 0 0 0 left shifter 32 bits from r src1 0 31 r dest 3 0 0 0 intermediate result (example: n = 3) shift amount n from operation modifier see also asl asr asri lsl lsli lsr lsri rol roli
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-10 arithmetic shift right syntax [ if r guard ] asr r src1 r src2 r dest function if r guard then { n r src2 <4:0> r dest <31:31?n> r src1 <31> r dest <30?n:0> r src1 <30:n> if rsrc2<31:5> != 0 { rdest <- rsrc1<31> } } attributes function unit shifter operation code 18 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the asr operation takes two arguments, r src1 and r src2 . r src2 specifies an unsigned shift amount, and r src1 is arithmetically shifted right by this amount. if the rsrc2<31:5> value is not zero, then take this as a shift by 32 or more bits. the msb (sign bit) of r src1 is replicated as needed to f ill vacated bits from the left. the asr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x7008000f, r20 = 1 asr r30 r20 r50 r50 0x38040007 r30 = 0x7008000f, r42 = 2 asr r30 r42 r60 r60 0x1c020003 r10 = 0, r30 = 0x7008000f, r44 = 4 if r10 asr r30 r44 r70 no change, since guard is false r20 = 1, r30 = 0x7008000f, r44 = 4 if r20 asr r30 r44 r80 r80 0x07008000 r40 = 0x80030007, r44 = 4 asr r40 r44 r90 r90 0xf8003000 r30 = 0x7008000f, r45 = 0x1f asr r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x1f asr r30 r45 r100 r100 0xffffffff r30 = 0x7008000f, r45 = 0x20 asr r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x20 asr r30 r45 r100 r100 0xffffffff r30 = 0x8008000f, r45 = 0x23 asr r30 r45 r100 r100 0xffffffff 0 31 r src1 0 r src2 s s s right shifter 32 bits from r src1 0 31 r dest 28 s s s intermediate result (example: n = 3) r src2 s s s 31 see also asl asli asri lsl lsli lsr lsri rol roli asr
pnx1300/01/02/11 data book philips semiconductors a-11 preliminary specification asri arithmetic shift right by immediate amount syntax [ if r guard ] asri( n ) r src1 r dest function if r guard then { r dest <31:31? n > r src1 <31> r dest <30? n :0> r src1 <31: n > } attributes function unit shifter operation code 10 number of operands 1 modifier 7 bits modifier range 0..31 latency 1 issue slots 1, 2 description as shown below, the asri operation takes a single argument in r src1 and an immediate modifier n and produces a result in r dest that is equal to r src1 arithmetically shifted right by n bits. the value of n must be between 0 and 31, inclusive. the msb (sign bit) of r src1 is replicated as needed to fill vacated bits from the left. the asri operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x7008000f asri(1) r30 r50 r50 0x38040007 r30 = 0x7008000f asri(2) r30 r60 r60 0x1c020003 r10 = 0, r30 = 0x7008000f if r10 asri(4) r30 r70 no change, since guard is false r20 = 1, r30 = 0x7008000f if r20 asri(4) r30 r80 r80 0x07008000 r40 = 0x80030007 asri(4) r40 r90 r90 0xf8003000 r30 = 0x7008000f asri(31) r30 r100 r100 0x00000000 r40 = 0x80030007 asri(31) r40 r110 r110 0xffffffff s s s right shifter 32 bits from r src1 0 31 r dest 28 s s s intermediate result (example: n = 3) s s 0 31 r src1 shift amount n from operation modifier s see also asl asli asr lsl lsli lsr lsri rol roli
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-12 bitwise logical and syntax [ if r guard ] bitand r src1 r src2 r dest function if r guard then r dest r src1 & r src2 attributes function unit alu operation code 16 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the bitand operation computes the bitwise, logical and of the first and second arguments, r src1 and r src2 . the result is stored in the destination register, r dest . the bitand operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xf310ffff, r40 = 0xffff 0000 bitand r30 r40 r90 r90 0xf3100000 r10 = 0, r50 = 0x88888888 if r10 bitand r30 r50 r80 no change, since guard is false r20 = 1, r30 = 0xf310ffff, r50 = 0x88888888 if r20 bitand r30 r50 r100 r100 0x80008888 r60 = 0x11119999, r50 = 0x88888888 bitand r60 r50 r110 r110 0x00008888 r70 = 0x55555555, r30 = 0xf310 ffff bitand r70 r30 r120 r120 0x51105555 see also bitor bitxor bitandinv bitand
pnx1300/01/02/11 data book philips semiconductors a-13 preliminary specification bitandinv bitwise logical and not syntax [ if r guard ] bitandinv r src1 r src2 r dest function if r guard then r dest r src1 & ~r src2 attributes function unit alu operation code 49 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the bitandinv operation computes the bitwise, logical and of the first argument, r src1 , with the 1?s complement of the second argument, r src2 . the result is stored in the destination register, r dest . the bitandinv operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xf310ffff, r40 = 0xffff 0000 bitandinv r30 r40 r90 r90 0x0000ffff r10 = 0, r50 = 0x88888888 if r10 bitandinv r30 r50 r80 no change, since guard is false r20 = 1, r30 = 0xf310ffff, r50 = 0x88888888 if r20 bitandinv r30 r50 r100 r100 0x73107777 r60 = 0x11119999, r50 = 0x88888888 bitandinv r60 r50 r110 r110 0x11111111 r70 = 0x55555555, r30 = 0xf310 ffff bitandinv r70 r30 r120 r120 0x04450000 see also bitand bitor bitxor
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-14 bitwise logical not syntax [ if r guard ] bitinv r src1 r dest function if r guard then r dest ~r src1 attributes function unit alu operation code 50 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the bitinv operation computes the bitwise, logical not of the argument r src1 and writes the result into r dest . the bitinv operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xf310ffff bitinv r30 r60 r60 0x0cef0000 r10 = 0, r40 = 0xffff 0000 if r10 bitinv r40 r70 no change, since guard is false r20 = 1, r40 = 0xffff 0000 if r20 bitinv r40 r100 r100 0x0000ffff r50 = 0x88888888 bitinv r50 r110 r110 0x77777777 see also bitand bitandinv bitor bitxor bitinv
pnx1300/01/02/11 data book philips semiconductors a-15 preliminary specification bitor bitwise logical or syntax [ if r guard ] bitor r src1 r src2 r dest function if r guard then r dest r src1 | r src2 attributes function unit alu operation code 17 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the bitor operation computes the bitwise, logical or of the first and second arguments, r src1 and r src2 . the result is stored in the destination register, r dest . the bitor operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xf310ffff, r40 = 0xffff 0000 bitor r30 r40 r90 r90 0xffffffff r10 = 0, r50 = 0x88888888 if r10 bitor r30 r50 r80 no change, since guard is false r20 = 1, r30 = 0xf310ffff, r50 = 0x88888888 if r20 bitor r30 r50 r100 r100 0xfb98ffff r60 = 0x11119999, r50 = 0x88888888 bitor r60 r50 r110 r110 0x99999999 r70 = 0x55555555, r30 = 0xf310 ffff bitor r70 r30 r120 r120 0xf755ffff see also bitand bitandinv bitinv bitxor
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-16 bitwise logical exclusive-or syntax [ if r guard ] bitxor r src1 r src2 r dest function if r guard then r dest r src1 r src2 attributes function unit alu operation code 48 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the bitxor operation computes the bitwis e, logical exclusive-or of th e first and second arguments, r src1 and r src2 . the result is stored in the destination register, r dest . the bitxor operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xf310ffff, r40 = 0xffff 0000 bitxor r30 r40 r90 r90 0x0cefffff r10 = 0, r50 = 0x88888888 if r10 bitxor r30 r50 r80 no change, since guard is false r20 = 1, r30 = 0xf310ffff, r50 = 0x88888888 if r20 bitxor r30 r50 r100 r100 0x7b987777 r60 = 0x11119999, r50 = 0x88888888 bitxor r60 r50 r110 r110 0x9999 1111 r70 = 0x55555555, r30 = 0xf310 ffff bitxor r70 r30 r120 r120 0xa645aaaa see also bitand bitandinv bitinv bitor bitxor
pnx1300/01/02/11 data book philips semiconductors a-17 preliminary specification borrow compute borrow bit from unsigned subtract pseudo-op for ugtr syntax [ if r guard ] borrow r src1 r src2 r dest function if r guard then { if r src1 < r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 33 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the borrow operation is a pseudo operation transformed by the scheduler into an ugtr with reversed arguments. (note: pseudo operations cannot be used in assembly source files.) the borrow operation computes the unsigned difference of the first and second arguments, r src1 ?r src2 . if the difference generates a borrow (if r src2 > r src1 ), 1 is stored in the destination register, r dest ; otherwise, r dest is set to 0. the borrow operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r70 = 2, r30 = 0xfffffffc borrow r70 r30 r80 r80 1 r10 = 0, r70 = 2, r30 = 0xfffffffc if r10 borrow r70 r30 r90 no change, since guard is false r20 = 1, r70 = 2, r30 = 0xfffffffc if r20 borrow r70 r30 r100 r100 1 r60 = 4, r30 = 0xfffffffc borrow r60 r30 r110 r110 1 r30 = 0xfffffffc borrow r30 r30 r120 r120 0 see also ugtr carry
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-18 compute carry bit from unsigned add syntax [ if r guard ] carry r src1 r src2 r dest function if r guard then { if (r src1 +r src2 ) < 2 32 then r dest 0 else r dest 1 } attributes function unit alu operation code 45 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the carry operation computes the unsigned sum of the first and second arguments, r src1 +r src2 . if the sum generates a carry (if the sum is greater than 2 32 -1), 1 is stored in the destination register, r dest ; otherwise, r dest is set to 0. the carry operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r70 = 2, r30 = 0xfffffffc carry r70 r30 r80 r80 0 r10 = 0, r70 = 2, r30 = 0xfffffffc if r10 carry r70 r30 r90 no change, since guard is false r20 = 1, r70 = 2, r30 = 0xfffffffc if r20 carry r70 r30 r100 r100 0 r60 = 4, r30 = 0xfffffffc carry r60 r30 r110 r110 1 r30 = 0xfffffffc carry r30 r30 r120 r120 1 see also borrow carry
pnx1300/01/02/11 data book philips semiconductors a-19 preliminary specification curcycles read current clock cycle counter, least- significant word syntax [ if r guard ] curcycles r dest function if r guard then r dest cccount<31:0> attributes function unit fcomp operation code 162 number of operands 0 modifier no modifier range ? latency 1 issue slots 3 description refer to section 3.1.5, ?cccount?clock cycle counter? for a description of th e cccount operation. the curcycles operation copies the current low 32 bits of the master clock cycle counter (cccount) to the destination register, r dest .. the master cccount increments on all cycles (processor-stall and non-stall) if pcsw.cs = 1; otherwise, the counter increments only on non-stall cycles. the curcycles operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result cccount_hr = 0xabcdefff12345678 curcycles r60 r30 0x12345678 r10 = 0, cccount_hr = 0xabcdefff12345678 if r10 curcycles r70 no change, since guard is false r20 = 1, cccount_hr = 0xabcdefff12345678 if r20 curcycles r100 r100 0x12345678 see also cycles hicycles writepcsw
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-20 read clock cycle counter, least-significant word syntax [ if r guard ] cycles r dest function if r guard then r dest cccount<31:0> attributes function unit fcomp operation code 154 number of operands 0 modifier no modifier range ? latency 1 issue slots 3 description refer to section 3.1.5, ?ccco unt?clock cycle counter? for a description of the cccount operation. the cycles operation copies the low 32 bits of the slave register of clock cycle counter (cccount) to the destination register, r dest . the contents of the master c ounter are transferre d to the slave cccount register only on a successful interruptible jump and on processor reset. thus, if cycles and hicycles are executed without intervening interruptible jumps, the operation pair is guaranteed to be a coherent sample of the master clock-cycle counter. the master counter increments on all cycles (proce ssor-stall and non-stall) if pcsw.cs = 1; otherwise, the counter increments only on non-stall cycles. the cycles operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result cccount_hr = 0xabcdefff12345678 cycles r60 r30 0x12345678 r10 = 0, cccount_hr = 0xabcdefff12345678 if r10 cycles r70 no change, since guard is false r20 = 1, cccount_hr = 0xabcdefff12345678 if r20 cycles r100 r100 0x12345678 see also hicycles curcycles writepcsw cycles
pnx1300/01/02/11 data book philips semiconductors a-21 preliminary specification data cache copy back syntax [ if r guard ] dcb( d ) r src1 function if r guard then { addr r src1 + d if dcache_valid_addr(addr) && dcache_dirty_addr(addr) then { dcache_copyback_addr(addr) dcache_reset_dirty_addr(addr) } } attributes function unit dmemspec operation code 205 number of operands 1 modifier 7 bits modifier range ?256..252 by 4 latency 3 issue slots 5 description the dcb operation causes a block in the data cache to be copi ed back to main memory if the block is marked dirty and valid, and the block?s dirty bit is reset. the target block of dcb is the block in the data cache that contains the byte addressed by r src1 + d . the d value is an opcode modifier, must be in th e range ?256 to 252 inclusive, and must be a multiple of 4. a valid copy of the target block remains in the cache. sta ll cycles are taken as necessar y to complete the copy-back operation. if the target block is not dirt y or if the block is not in the cache, dcb has no effect and no stall cycles are taken. dcb has no effect on blocks that are in the non-cacheabl e sdram aperture. dcb does not change the replacement status of data-cache blocks. dcb ensures coherency between caches and main memory by discarding all pending prefetch operations and by causing all non-empty copyback buffers to be emptied to main memory. the dcb operation optionally take s a guard, specified in r guard . if a guard is present, its lsb controls if the operation is carried ou t or not.if the lsb of r guard is 1, the operation is carried out; otherwise,it is not carried out. examples initial values operation result dcb(0) r30 r10 = 0 if r10 dcb(4) r40 no change and no stall cycles, since guard is false r20 = 1 if r20 dcb(8) r50 see also dinvalid dcb
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-22 invalidate data cache block syntax [ if r guard ] dinvalid( d ) r src1 function if r guard then { addr r src1 + d if dcache_valid_addr(addr) then { dcache_reset_valid_addr(addr) dcache_reset_dirty_addr(addr) } } attributes function unit dmemspec operation code 206 number of operands 1 modifier 7 bits modifier range ?256..252 by 4 latency 3 issue slots 5 description the dinvalid operation resets the valid and dirty bit of a block in the data cache. regardless of the block?s dirty bit, the block is not written back to main memory. the target block of dinvalid is the block in the data cache that contains the byte addressed by r src1 + d . the d value is an opcode modifier, must be in the range ?256 to 252 inclusive, and must be a multiple of 4. stall cycles are taken as necessary to complete the invalidate operation. if the target block is not in the cache, dinvalid has no effect and no stall cycles are taken. dinvalid has no effect on blocks that are in the non-cacheable sdram aperture. dinvalid does clear the valid bits of locked blocks. dinvalid does not change the replacement status of data-cache blocks. dinvalid ensures coherency between caches and main memory by discarding all pendi ng prefetch operations and by causing all non-empty copyback buffers to be emptied to main memory. the dinvalid operation optionally take s a guard, specified in r guard . if a guard is present, its lsb controls if the operation is carried out or not. if the lsb of r guard is 1, the operation is carried ou t; otherwise, it is not carried out. examples initial values operation result dinvalid(0) r30 r10 = 0 if r10 dinvalid(4) r40 no change and no stall cycles, since guard is false r20 = 1 if r20 dinvalid(8) r50 see also dcb dinvalid
pnx1300/01/02/11 data book philips semiconductors a-23 preliminary specification clipped signed absolute value pseudo-op for h_dspiabs syntax [ if r guard ] dspiabs r src1 r dest function if r guard then { if r src1 >= 0 then r dest r src1 else if rsrc1 = 0x80000000 then r dest 0x7fffffff else r dest ?r src1 } attributes function unit dspalu operation code 65 number of operands 1 modifier no modifier range ? latency 2 issue slots 1, 3 description the dspiabs operation is a pseudo operation tran sformed by the scheduler into an h_ dspiabs with a constant first argument zero and second argument equal to the dspiabs argument. (note: pseudo operations cannot be used in assembly source files.) the dspiabs operation computes the absolute value of rsrc1, clips the result into the range [2 31 ?1..0] (or [0x7fffffff..0]), and stores the clipped value into r dest . all values are signed integers. the dspiabs operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffffffff dspiabs r30 r60 r60 0x00000001 r10 = 0, r40 = 0x80000001 if r10 dspiabs r40 r70 no change, since guard is false r20 = 1, r40 = 0x80000001 if r20 dspiabs r40 r100 r100 0x7fffffff r50 = 0x80000000 dspiabs r50 r80 r80 0x7fffffff r90 = 0x7fffffff dspiabs r90 r110 r110 0x7fffffff see also h_dspiabs h_dspidualabs dspiadd dspimul dspisub dspuadd dspumul dspusub dspiabs
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-24 clipped signed add syntax [ if r guard ] dspiadd r src1 r src2 r dest function if r guard then { temp sign_ext32to64(r src1 ) + sign_ext32to64(r src2 ) if temp < 0xffffffff 80000000 then r dest 0x80000000 else if temp > 0x000000007 fffffff then r dest 0x7fffffff else r dest temp } attributes function unit dspalu operation code 66 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspiadd operation computes the sum r src1 +r src2 , clips the result into the 32-bit signed range [2 31 ?1..?2 31 ] (or [0x7fffffff..0x80000000]), and stores the clipped value into r dest . all values are signed integers. the dspiadd operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x1200, r40 = 0xff dspiadd r30 r40 r60 r60 0x12ff r10 = 0, r30 = 0x1200, r40 = 0xff if r10 dspiadd r30 r40 r80 no change, since guard is false r20 = 1, r30 = 0x1200, r40 = 0xff if r20 dspiadd r30 r40 r100 r100 0x12ff r50 = 0x7fffffff, r90 = 1 dspiadd r50 r90 r110 r110 0x7fffffff r70 = 0x80000000, r80 = 0xff ffffff dspiadd r70 r80 r120 r120 0x80000000 0 31 r src1 0 31 r src2 0 31 r dest + 0 32 clip to [2 31 ?1..?2 31 ] signed signed full-precision 33-bit result signed signed see also dspiabs dspimul dspisub dspuadd dspumul dspusub dspiadd
pnx1300/01/02/11 data book philips semiconductors a-25 preliminary specification dual clipped absolute value of signed 16-bit halfwords pseudo-op for h_dspidualabs syntax [ if r guard ] dspidualabs r src1 r dest function if r guard then { temp1 sign_ext16to32(r src1 <15:0>) temp2 sign_ext16to32(r src1 <31:16>) if temp1 = 0xffff 8000 then temp1 0x7fff if temp2 = 0xffff 8000 then temp2 0x7fff if temp1 < 0 then temp1 ?temp1 if temp2 < 0 then temp2 ?temp2 r dest <31:16> temp2<15:0> r dest <15:0> temp1<15:0> } attributes function unit dspalu operation code 72 number of operands 1 modifier no modifier range ? latency 2 issue slots 1, 3 description the dspidualabs operation is a pseudo operation transformed by the scheduler into an h_ dspidualabs with a constant zero as first argument and the dspidualabs argument as second argument. (note: pseudo operations cannot be used in asse mbly source files.) the dspidualabs operation performs two 16-bit clipped, signed absolute value computations separately on the high and low 16-bit halfwords of r src1 . both absolute values are clipped into the range [0x0..0x7fff] and written into the corresponding halfwords of r dest . all values are signed 16-bit integers. the dspidualabs operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinat ion register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffff0032 dspidualabs r30 r60 r60 0x00010032 r10 = 0, r40 = 0x80008001 if r10 dspidualabs r40 r70 no change, since guard is false r20 = 1, r40 = 0x80008001 if r20 dspidualabs r40 r100 r100 0x7fff7fff r50 = 0x0032ffff dspidualabs r50 r80 r80 0x00320001 r90 = 0x7fffffff dspidualabs r90 r110 r110 0x7fff0001 see also h_dspidualabs dspiabs dspidualadd dspidualmul dspidualsub dspidualabs
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-26 dual clipped add of signed 16-bit halfwords syntax [ if r guard ] dspidualadd r src1 r src2 r dest function if r guard then { temp1 sign_ext16to32(r src1 <15:0>) + sign_ext16to32(r src2 <15:0>) temp2 sign_ext16to32(r src1 <31:16>) + sign_ext16to32(r src2 <31:16>) if temp1 < 0xffff 8000 then temp1 0x8000 if temp2 < 0xffff 8000 then temp2 0x8000 if temp1 > 0x7fff then temp1 0x7fff if temp2 > 0x7fff then temp2 0x7fff r dest <31:16> temp2<15:0> r dest <15:0> temp1<15:0> } attributes function unit dspalu operation code 70 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspidualadd operation computes two 16-bit clipped , signed sums separately on the two pairs of high and low 16-bit halfwords of r src1 and r src2 . both sums are clipped into the range [2 15 ?1..?2 15 ] (or [0x7fff..0x8000]) and written into the corresponding halfwords of r dest . all values are signed 16-bit integers. the dspidualadd operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x12340032, r40 = 0x00010002 dspidualadd r30 r40 r60 r60 0x12350034 r10 = 0, r30 = 0x12340032, r40 = 0x00010002 if r10 dspidualadd r30 r40 r70 no change, since guard is false r20 = 1, r30 = 0x12340032, r40 = 0x00010002 if r20 dspidualadd r30 r40 r100 r100 0x12350034 r50 = 0x80000001, r80 = 0xffff7fff dspidualadd r50 r80 r90 r90 0x80007fff r110 = 0x00017fff, r120 = 0x7fff7fff dspidualadd r110 r120 r125 r125 0x7fff7fff 0 15 31 r src1 0 15 31 r src2 0 31 r dest + + 15 0 17 0 17 two full-precision 17-bit signed sums clip to [2 15 ?1 .. ?2 15 ] clip to [2 15 ?1 .. ?2 15 ] signed signed signed signed signed signed signed signed see also dspidualabs dspidualmul dspidualsub dspiabs dspidualadd
pnx1300/01/02/11 data book philips semiconductors a-27 preliminary specification dual clipped multiply of signed 16-bit halfwords syntax [ if r guard ] dspidualmul r src1 r src2 r dest function if r guard then { temp1 sign_ext16to32(r src1 <15:0>) sign_ext16to32(r src2 <15:0>) temp2 sign_ext16to32(r src1 <31:16>) sign_ext16to32(r src2 <31:16>) if temp1 < 0xffff 8000 then temp1 0x8000 if temp2 < 0xffff 8000 then temp2 0x8000 if temp1 > 0x7fff then temp1 0x7fff if temp2 > 0x7fff then temp2 0x7fff r dest <31:16> temp2<15:0> r dest <15:0> temp1<15:0> } attributes function unit dspmul operation code 95 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the dspidualmul operation computes two 16-bit clipped, signed products separately on the two pairs of high and low 16-bit halfwords of r src1 and r src2 . both products are clipped into the range [2 15 ?1..?2 15 ] (or [0x7fff..0x8000]) and written into the corresponding halfwords of r dest . all values are signed 16-bit integers. the dspidualmul operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinat ion register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x0020010, r40 = 0x00030020 dspidualmul r30 r40 r60 r60 0x00060200 r10 = 0, r30 = 0x0020010, r40 = 0x00030020 if r10 dspidualmul r30 r40 r70 no change, since guard is false r20 = 1, r30 = 0x0020010, r40 = 0x00030020 if r20 dspidualmul r30 r40 r100 r100 0x00060200 r50 = 0x80000002, r80 = 0x00024000 dspidualmul r50 r80 r90 r90 0x80007fff r110 = 0x08000003, r120 = 0x00108001 dspidualmul r110 r120 r125 r125 0x7fff8000 0 15 31 r src1 0 15 31 r src2 0 31 r dest 15 0 31 0 31 two full-precision 32-bit signed products clip to [2 15 ?1..?2 15 ] clip to [2 15 ?1..?2 15 ] signed signed signed signed signed signed signed signed see also dspidualabs dspidualadd dspidualsub dspiabs dspidualmul
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-28 dual clipped subtract of signed 16-bit halfwords syntax [ if r guard ] dspidualsub r src1 r src2 r dest function if r guard then { temp1 sign_ext16to32(r src1 <15:0>) ? sign_ext16to32(r src2 <15:0>) temp2 sign_ext16to32(r src1 <31:16>) ? sign_ext16to32(r src2 <31:16>) if temp1 < 0xffff 8000 then temp1 0x8000 if temp2 < 0xffff 8000 then temp2 0x8000 if temp1 > 0x7fff then temp1 0x7fff if temp2 > 0x7fff then temp2 0x7fff r dest <31:16> temp2<15:0> r dest <15:0> temp1<15:0> } attributes function unit dspalu operation code 71 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspidualsub operation computes two 16-bit clipped, signed differences separately on the two pairs of high and lo w 16-bit halfwords of r src1 and r src2 . both differences are clipped into the range [2 15 ?1..?2 15 ] (or [0x7fff..0x8000]) and written into the corresponding halfwords of r dest . all values are signed 16-bit integers. the dspidualsub operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x12340032, r40 = 0x00010002 dspidualsub r30 r40 r60 r60 0x12330030 r10 = 0, r30 = 0x12340032, r40 = 0x00010002 if r10 dspidualsub r30 r40 r70 no change, since guard is false r20 = 1, r30 = 0x12340032, r40 = 0x00010002 if r20 dspidualsub r30 r40 r100 r100 0x12330030 r50 = 0x80000001, r80 = 0x00018001 dspidualsub r50 r80 r90 r90 0x80007fff r110 = 0x00018001, r120 = 0x80010002 dspidualsub r110 r120 r125 r125 0x7fff8000 0 15 31 r src1 0 15 31 r src2 0 31 r dest ? ? 15 0 17 0 17 two full-precision 17-bit signed differences clip to [2 15 ?1..?2 15 ] clip to [2 15 ?1..?2 15 ] signed signed signed signed signed signed signed signed see also dspidualabs dspidualadd dspidualmul dspiabs dspidualsub
pnx1300/01/02/11 data book philips semiconductors a-29 preliminary specification clipped signed multiply syntax [ if r guard ] dspimul r src1 r src2 r dest function if r guard then { temp sign_ext32to64(r src1 ) sign_ext32to64(r src2 ) if temp < 0xffffffff 80000000 then r dest 0x80000000 else if temp > 0x000000007 fffffff then r dest 0x7fffffff else r dest temp<31:0> } attributes function unit ifmul operation code 141 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the dspimul operation computes the product r src1 r src2 , clips the result into the 32-bit range [2 31 ?1..?2 31 ] (or [0x7fffffff..0x80000000]), and stores the clipped value into r dest . all values are signed integers. the dspimul operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x10, r40 = 0x20 dspimul r30 r40 r60 r60 0x200 r10 = 0, r30 = 0x10, r40 = 0x20 if r10 dspimul r30 r40 r80 no change, since guard is false r20 = 1, r30 = 0x10, r40 = 0x20 if r20 dspimul r30 r40 r100 r100 0x200 r50 = 0x40000000, r90 = 2 dspimul r50 r90 r110 r110 0x7fffffff r80 = 0xffffffff dspimul r80 r80 r120 r120 0x1 r70 = 0x80000000, r90 = 2 dspimul r70 r90 r120 r120 0x80000000 0 31 r src1 0 31 r src2 0 31 r dest 0 63 clip to [2 31 ?1..?2 31 ] signed signed full-precision 64-bit result signed signed see also dspiabs dspiadd dspisub dspuadd dspumul dspusub dspimul
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-30 clipped signed subtract syntax [ if r guard ] dspisub r src1 r src2 r dest function if r guard then { temp sign_ext32to64(r src1 ) ? sign_ext32to64(r src2 ) if temp < 0xfffffffff 80000000 then r dest 0x80000000 else if temp > 0x000000007 fffffff then r dest 0x7fffffff else r dest temp<31:0> } attributes function unit dspalu operation code 68 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspisub operation computes the difference r src1 ?r src2 , clips the result into the 32-bit range [2 31 ?1..?2 31 ] (or [0x7fffffff..0x80000000]), and stores the clipped value into r dest . all values are signed integers. the dspisub operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x1200, r40 = 0xff dspisub r30 r40 r60 r60 0x1101 r10 = 0, r30 = 0x1200, r40 = 0xff if r10 dspisub r30 r40 r80 no change, since guard is false r20 = 1, r30 = 0x1200, r40 = 0xff if r20 dspisub r30 r40 r100 r100 0x1101 r50 = 0x7fffffff, r90 = 0xffffffff dspisub r50 r90 r110 r110 0x7fffffff r70 = 0x80000000, r80 = 1 dspisub r70 r80 r120 r120 0x80000000 0 31 r src1 0 31 r src2 0 31 r dest ? 0 32 clip to [2 31 ?1..?2 31 ] signed signed full-precision 33-bit result signed signed see also dspiabs dspiadd dspimul dspuadd dspumul dspusub dspisub
pnx1300/01/02/11 data book philips semiconductors a-31 preliminary specification clipped unsigned add syntax [ if r guard ] dspuadd r src1 r src2 r dest function if r guard then { temp zero_ext32to64(r src1 ) + zero_ext32to64(r src2 ) if (unsigned)temp > 0x00000000 ffffffff then r dest 0xffffffff else r dest temp<31:0> } attributes function unit dspalu operation code 67 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspuadd operation computes unsigned sum r src1 +r src2 , clips the result into the unsigned range [2 32 ?1..0] (or [0xffffffff..0]), and st ores the clipped value into r dest . the dspuadd operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x1200, r40 = 0xff dspuadd r30 r40 r60 r60 0x12ff r10 = 0, r30 = 0x1200, r40 = 0xff if r10 dspuadd r30 r40 r80 no change, since guard is false r20 = 1, r30 = 0x1200, r40 = 0xff if r20 dspuadd r30 r40 r100 r100 0x12ff r50 = 0xffffffff, r90 = 1 dspuadd r50 r90 r110 r110 0xffffffff r70 = 0x80000001, r80 = 0x7f ffffff dspuadd r70 r80 r120 r120 0xffffffff 0 31 r src1 0 31 r src2 0 31 r dest + 0 32 clip to [2 32 ?1..0] unsigned unsigned full-precision 33-bit result unsigned unsigned see also dspiabs dspiadd dspimul dspisub dspumul dspusub dspuadd
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-32 clipped unsigned multiply syntax [ if r guard ] dspumul r src1 r src2 r dest operation if r guard then { temp zero_ext32to64(r src1 ) zero_ext32to64(r src2 ) if (unsigned)temp > 0x00000000 ffffffff then r dest 0xffffffff else r dest temp<31:0> } attributes function unit ifmul operation code 142 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the dspumul operation computes unsigned product r src1 r src2 , clips the result into the unsigned range [2 32 ?1..0] (or [0xffffffff..0]), and stores the clipped value into r dest . the dspumul operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x10, r40 = 0x20 dspumul r30 r40 r60 r60 0x200 r10 = 0, r30 = 0x10, r40 = 0x20 if r10 dspumul r30 r40 r80 no change, since guard is false r20 = 1, r30 = 0x10, r40 = 0x20 if r20 dspumul r30 r40 r100 r100 0x200 r50 = 0x40000000, r90 = 2 dspumul r50 r90 r110 r110 0x80000000 r80 = 0xffffffff dspumul r80 r80 r120 r120 0xffffffff r70 = 0x80000000, r90 = 2 dspumul r70 r90 r120 r120 0xffffffff 0 31 r src1 0 31 r src2 0 31 r dest 0 63 clip to [2 32 ?1..0] unsigned unsigned full-precision 64-bit result unsigned unsigned see also dspiabs dspiadd dspisub dspuadd dspumul dspusub dspumul
pnx1300/01/02/11 data book philips semiconductors a-33 preliminary specification quad clipped add of unsigned/signed bytes syntax [ if r guard ] dspuquadaddui r src1 r src2 r dest function if r guard then { for (i 0, m 31, n 24; i < 4; i i + 1, m m ? 8, n n ? 8) { temp zero_ext8to32(r src1 ) + sign_ext8to32(r src2 ) if temp < 0 then r dest 0 else if temp > 0xff then r dest 0xff else r dest temp<7:0> } } attributes function unit dspalu operation code 78 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspuquadaddui operation computes four separate su ms of the four pairs of corresponding 8-bit bytes of r src1 and r src2 . the bytes in r src1 are considered unsigned values; the bytes in r src2 are considered signed. the four sums are clipped into the unsigned range [255..0] (or [0xff..0]); thus, the final byte sums are unsigned. all computations are perf ormed without loss of precision. the dspuquadaddui operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the de stination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x02010001, r40 = 0xf fffff01 dspuquadaddui r30 r40 r50 r50 0x01000002 r10 = 0, r60 = 0x9c9c6464, r70 = 0x649c649c if r10 dspuquadaddui r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x9c9c6464, r70 = 0x649c649c if r20 dspuquadaddui r60 r70 r90 r90 0xff38c800 0 15 31 r src1 0 15 31 r src2 0 31 r dest + + + + 23 7 23 7 7 15 23 0 9 0 9 0 9 0 9 four full-precision 10-bit signed sums clip to [255..0] unsigned unsigned unsigned unsigned signed signed signed signed signed signed signed signed unsigned unsigned unsigned unsigned clip to [255..0] clip to [255..0] clip to [255..0] see also dspidualadd dspuquadaddui
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-34 clipped unsigned subtract syntax [ if r guard ] dspusub r src1 r src2 r dest function if r guard then { temp zero_ext32to64(r src1 ) ? zero_ext32to64(r src2 ) if (signed)temp < 0 then r dest 0 else r dest temp<31:0> } attributes function unit dspalu operation code 69 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the dspusub operation computes unsigned difference r src1 ?r src2 , clips the result into the unsigned range [2 32 ?1..0] (or [0xffffffff..0]), and st ores the clipped value into r dest . the dspusub operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x1200, r40 = 0xff dspusub r30 r40 r60 r60 0x1101 r10 = 0, r30 = 0x1200, r40 = 0xff if r10 dspusub r30 r40 r80 no change, since guard is false r20 = 1, r30 = 0x1200, r40 = 0xff if r20 dspusub r30 r40 r100 r100 0x1101 r50 = 0, r90 = 1 dspusub r50 r90 r110 r110 0 r70 = 0x80000001, r80 = 0xff ffffff dspusub r70 r80 r120 r120 0 0 31 r src1 0 31 r src2 0 31 r dest ? 0 32 clip to [2 32 ?1..0] unsigned unsigned full-precision 33-bit result signed unsigned see also dspiabs dspiadd dspimul dspisub dspuadd dspumul dspusub
pnx1300/01/02/11 data book philips semiconductors a-35 preliminary specification dualasr dual-16 arithmetic shift right syntax [ if rguard ] dualasr rsrc1 rsrc2 rdest function if r guard then { n <- rsrc2<3:0> rdest<31:31-n> <- rsrc1<31> rdest<30-n:16> <- rsrc1<30:16+n> rdest<15:15-n> <- rsrc1<15> rdest<14-n:0> <- rsrc1<14:n> if rsrc2<31:4> != 0 { rdest<31:16> <- rsrc1<31> rdest<15:0> <- rsrc1<15> } } attributes function unit shifter operation code 102 number of operands 2 modifier no modifier range - latency 1 issue slots 1,2 description the argument rsrc1 contains two 16-bit signed intege rs, rsrc1<31:16> and rsrc 1<15:0>. rsrc2 specifies an unsigned shift amount, and the two 16-bit integers shifted ri ght by this amount. the sign bits rsrc1<31> and rsrc1<15> are replicated as needed within each 16-bit value from the left . if the rsrc2<31:4> value is not zero, then take this as a shift by 16 or more, i.e. extend the sign bit into either result. the dualasr operation optionally takes a guard, specified in rguard. if a guard is present, its lsb controls the modification of the destination register. if the lsb of rguard is 1, rdest is wr itten; otherwise, rdest is not changed. examples initial values operation result r30 = 0x70087008, r40 = 0x1 dualasr r30 r40 -> r50 r50 <- 0x38043804 r30 = 0x70087008, r40 = 0x2 dualasr r30 r40 -> r50 r50 <- 0x1c021c02 r10 = 0, r30 = 0x70087008, r40 = 0x2 if r10 dualasr r30 r40 -> r50 no change, since guard is false r10 = 1, r30 = 0x70084008, r40 = 0x4 if r10 dualasr r30 r40 -> r50 r50 <- 0x07000400 r10 = 1, r30 = 0x800c800c, r40 = 0x4 if r10 dualasr r30 r40 -> r50 r50 <- 0xf800f800 r10 = 1, r30 = 0x700c700c, r40 = 0xf if r10 dualasr r30 r40 -> r50 r50 <- 0x00000000 r10 = 1, r30 = 0x700c800c, r40 = 0xf if r10 dualasr r30 r40 -> r50 r50 <- 0x0000ffff r10 = 1, r30 = 0x800c700c, r40 = 0xf if r10 dualasr r30 r40 -> r50 r50 <- 0xffff0000 r10 = 1, r30 = 0x800c700c, r40 = 0x10000000 if r10 dualasr r30 r40 -> r50 r50 <- 0xffff0000 r10 = 1, r30 = 0x800c700c, r40 = 0x10 if r10 dualasr r30 r40 -> r50 r50 <- 0xffff0000 0 31 r src1 0 31 r src2 n right shifter 0 31 r dest 28 s s s four lsbs of r src2 s s s 15 right shifter four lsbs of r src2 s s s lower 13 bits intermediate result (example: n = 3) s s s s lower 13 bits intermediate result (example: n = 3) s 15 12 s s s s see also asl asli asri lsl lsli lsr lsri rol roli
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-36 dual-16 clip signed to signed syntax [ if rguard ] dualiclipi rsrc1 rsrc2 rdest function if r guard then { rdest<31:16> <- min(max(rscrc1<31: 16>, -rsrc2<15:0>-1), rsrc2<15:0>) rdest<15:0> <- min(max(rscrc1<15: 0>, -rsrc2<15:0>-1), rsrc2<15:0>) } attributes function unit dspalu operation code 82 number of operands 2 modifier no modifier range - latency 2 issue slots 1,3 description the argument rsrc1 contains two signed1 6-bit integers, rsrc1<31: 16> and rsrc1<15:0>. each integer value is clipped into the signed integer range (-rsrc2 -1) to rsrc2. the value in rsrc2 contains an unsigned integer and must have the value between 0 and 0x7fff inclusive. the dualiclipi operation optionally takes a guard, specified in rguard. if a guard is present, its lsb controls the modification of the destination register. if the lsb of rguard is 1, rdest is written; otherwise, rdest is not changed. examples initial values operation result r30 = 0x00800080, r40 = 0x7f dualiclipi r30 r40 -> r50 r50 <- 0x007f007f r30 = 0x7ffff7ffff, r40 = 0x7ffe dualiclipi r30 r40 -> r50 r50 <- 0x7ffe7ffe r10 = 0, r30 = 0x7ffff7ffff, r40 = 0x7ffe if r10 dualiclipi r30 r40 -> r50 no change, since guard is false r10 = 1, r30 = 0x12345678, r40 = 0xabc if r10 dualiclipi r30 r40 -> r50 r50 <- 0x0abc0abc r10 = 1, r30 = 0x80008000, r40 = 0x03ff if r10 dualiclipi r30 r40 -> r50 r50 <- 0xfc00fc00 r10 = 1, r30 = 0x800003fe, r40 = 0x03ff if r10 dualiclipi r30 r40 -> r50 r50 <- 0xfc0003fe r10 = 1, r30 = 0x000f03fe, r40 = 0x03ff if r10 dualiclipi r30 r40 -> r50 r50 <- 0x000f03fe see also iclipi uclipi dualuclipi imin imax quadumax quadumin dualiclipi
pnx1300/01/02/11 data book philips semiconductors a-37 preliminary specification dualuclipi dual-16 clip signed to unsigned syntax [ if rguard ] dualuclipi rsrc1 rsrc2 rdest function if r guard then { rdest<31:16> <- min(max(rscr c1<31:16>, 0), rsrc2<15:0>) rdest<15:0> <- min(max(rs crc1<15:0>, 0), rsrc2<15:0>) } attributes function unit dspalu operation code 83 number of operands 2 modifier no modifier range - latency 2 issue slots 1,3 description the argument rsrc1 contains two 16-bit signed integers , rsrc1<31:16> and rsrc1<15:0>. each integer value is clipped into the unsigned integer range 0 to rsrc2. the va lue in rsrc2 contains an unsigned integer and must have the value between 0 and 0xffff inclusive. the dualuclipi operation optionally takes a guard, specifie d in rguard. if a guard is present, its lsb controls the modification of the destination register. if the lsb of rguard is 1, rdest is wr itten; otherwise, rdest is not changed. examples initial values operation result r30 = 0x00800080, r40 = 0x7f dualuclipi r30 r40 -> r50 r50 <- 0x007f007f r30 = 0x7ffff7ffff, r40 = 0x7ffe dualuclipi r30 r40 -> r50 r50 <- 0x7ffe7ffe r10 = 0, r30 = 0x7ffff7ffff, r40 = 0x7ffe if r10 dualuclipi r30 r40 -> r50 no change, since guard is false r10 = 1, r30 = 0x12345678, r40 = 0xabc if r10 dualuclipi r30 r40 -> r50 r50 <- 0x0abc0abc r10 = 1, r30 = 0x80008000, r40 = 0x03ff if r10 dualuclipi r30 r40 -> r50 r50 <- 0x00000000 r10 = 1, r30 = 0x800003fe, r40 = 0x03ff if r10 dualuclipi r30 r40 -> r50 r50 <- 0x000003fe r10 = 1, r30 = 0x000f 03fe, r40 = 0x03ff if r10 dualuclipi r30 r40 -> r50 r50 <- 0x000f03fe see also iclipi uclipi dualiclipi imin imax quadumax quadumin
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-38 floating-point absolute value syntax [ if r guard ] fabsval r src1 r dest function if r guard then { if (float)r src1 < 0 then r dest ?(float)r src1 else r dest (float)r src1 } attributes function unit falu operation code 115 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the fabsval operation computes the absolute value of the argument r src1 and stores the result into r dest . all values are in ieee single-precis ion floating-point forma t. if an argument is denormalized , zero is substituted for the argument before computing the absolute value, and the ifz flag in the pcsw is set. if fabsval causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if an y other floating-point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existi ng pcsw value for that exception flag. the fabsvalflags operation computes the ex ception flags that would re sult from an individual fabsval . the fabsval operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0) fabsval r30 r90 r90 0x40400000 (3.0) r35 = 0xbf800000 (-1.0) fabsval r35 r95 r95 0x3f800000 (1.0) r40 = 0x00400000 (5.877471754e-39) fabsval r40 r100 r100 0x0 (+0.0), ifz set r45 = 0xffffffff (qnan) fabsval r45 r105 r105 0xffffffff (qnan) r50 = 0xffbfffff (snan) fabsval r50 r110 r110 0xffffffff (qnan), inv set r10 = 0, r55 = 0xff7fffff (?3. 402823466e+38) if r10 fabsval r55 r115 no change, since guard is false r20 = 1, r55 = 0xff7fffff (?3. 402823466e+38) if r20 fabsval r55 r120 r120 0x7f7fffff (3. 402823466e+38) see also iabs dspiabs dspidualabs fabsvalflags readpcsw writepcsw fabsval
pnx1300/01/02/11 data book philips semiconductors a-39 preliminary specification ieee status flags from floating-point absolute value syntax [ if r guard ] fabsvalflags r src1 r dest function if r guard then r dest ieee_flags(abs_val((float)r src1 )) attributes function unit falu operation code 116 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the fabsvalflags operation computes the ieee exceptions that would result from computing the absolute value of r src1 and writes a bit vector representing the exception flags into r dest . the argument value is in ieee single- precision floating-point format; th e result is an integer bit vect or. the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. if r src1 is denormalized, the ifz bit in the result is set. the fabsvalflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinat ion register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0) fabsvalflags r30 r90 r90 0x0 r35 = 0xbf800000 (-1.0) fabsvalflags r35 r95 r95 0x0 r40 = 0x00400000 (5.877471754e-39) fabsvalflags r40 r100 r100 0x20 (ifz) r45 = 0xffffffff (qnan) fabsvalflags r45 r105 r105 0x0 r50 = 0xffbfffff (snan) fabsvalflags r50 r110 r110 0x10 (inv) r10 = 0, r55 = 0xff7fffff (?3. 402823466e+38) if r10 fabsvalflags r55 r115 no change, since guard is false r20 = 1, r55 = 0xff7fffff (?3. 402823466e+38) if r20 fabsvalflags r55 r120 r120 0x0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fabsval faddflags readpcsw fabsvalflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-40 floating-point add syntax [ if r guard ] fadd r src1 r src2 r dest function if r guard then r dest (float)r src1 + (float)r src2 attributes function unit falu operation code 22 number of operands 2 modifier no modifier range ? latency 3 issue slots 1, 4 description the fadd operation computes the sum r src1 +r src2 and stores the result into r dest . all values are in ieee single- precision floating-point format. rounding is according to the ieee rounding mode bits in pcsw. if an argument is denormalized, zero is substituted for the argument before co mputing the sum, and the ifz fl ag in the pcsw is set. if the result is denormalized, th e result is set to zero instead, and the ofz flag in the pcsw is set. if fadd causes an ieee exception, the corresponding exce ption flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floating-poi nt operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating- point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the exis ting pcsw value for that exception flag. the faddflags operation computes the exception flags that would result from an individual fadd . the fadd operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r60 = 0xc0400000 (?3.0), r30 = 0x3f800000 (1.0) fadd r60 r30 r90 r90 0xc0000000 (?2.0) r40 = 0x40400000 (3.0), r60 = 0xc0400000 (?3.0) fadd r40 r60 r95 r95 0x00000000 (0.0) r10 = 0, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e-38) if r10 fadd r40 r80 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e-38) if r20 fadd r40 r80 r110 r110 0x40400000 (3.0), inx flag set r40 = 0x40400000 (3.0), r81 = 0x00400000 (5.877471754e?39) fadd r40 r81 r111 r111 0x40400000 (3.0), ifz flag set r82 = 0x00c00000 (1.763241526e-38), r83 = 0x80800000 (?1.175494351e-38) fadd r82 r83 r112 r112 0x00000000 (0.0), ofz, unf, inx flags set r84 = 0x7f800000 (+inf), r85 = 0xff800000 (?inf) fadd r84 r85 r113 r113 0xffffffff (qnan), inv flag set r70 = 0x7f7fffff (3.40 2823466e+38) fadd r70 r70 r120 r120 0x7f800000 (+inf), ovf, inx flags set r80 = 0x00800000 (1.763241526e?38) fadd r80 r80 r125 r125 0x01000000 (2.350988702e?38) see also faddflags iadd dspiadd dspidualadd readpcsw writepcsw fadd
pnx1300/01/02/11 data book philips semiconductors a-41 preliminary specification ieee status flags from floating-point add syntax [ if r guard ] faddflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 + (float)r src2 ) attributes function unit falu operation code 112 number of operands 2 modifier no modifier range ? latency 3 issue slots 1, 4 description the faddflags operation computes the ieee exceptions that would result from computing the sum r src1 +r src2 and stores a bit vector represent ing the exception flags into r dest . the argument values are in ieee single-precision floating-point format; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pc sw are left unchanged by this operation. rounding is according to the ieee rounding mode bi ts in pcsw. if an argument is denor malized, zero is substitute d before computing the sum, and the ifz bit in the result is set. if the sum would be denormalized, the ofz bit in the result is set. the faddflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r10 = 0x7f7fffff (3. 402823466e+38), r20 = 0x3f800000 (1.0) faddflags r10 r20 r60 r60 0x2 (inx) r30 = 0, r10 = 0x7f7fffff (3. 402823466e+38) if r30 faddflags r10 r10 r50 no change, since guard is false r40 = 1, r10 = 0x7f7fffff (3. 402823466e+38) if r40 faddflags r10 r10 r70 r70 0xa (ovf inx) r80 = 0x00a00000 (1.469367939e?38), r81 = 0x80800000 (?1.17549435e?38) faddflags r80 r81 r100 r100 0x46 (ofz unf inx) r95 = 0x7f800000 (+inf), r96 = 0xff800000 (?inf) faddflags r95 r96 r105 r105 0x10 (inv) r98 = 0x40400000 (3.0), r99 = 0x00400000 (5.877471754e?39) faddflags r98 r99 r111 r111 0x20 (ifz) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fadd fsubflags readpcsw faddflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-42 floating-point divide syntax [ if r guard ] fdiv r src1 r src2 r dest function if r guard then r dest (float)r src1 / (float)r src2 attributes function unit ftough operation code 108 number of operands 2 modifier no modifier range ? latency 17 recovery 16 issue slots 2 description the fdiv operation computes the quotient r src1 r src2 and stores the result into r dest . all values are in ieee single-precision floating-point format. ro unding is according to the ieee rounding mode bits in pcsw. if an argument is denormalized, zero is substituted fo r the argument before computing the quotient, and the ifz flag in the pcsw is set. if the result is denormalized, the result is set to zero instead, and the ofz flag in the pcsw is set. if fdiv causes an ieee exception, the corres ponding exception flags in th e pcsw are set. the pcsw ex ception flags are sticky: the flags can be set as a side-effect of any floating-poi nt operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating- point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the exis ting pcsw value for that exception flag. the fdivflags operation computes the exception flags that would result from an individual fdiv . the fdiv operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r60 = 0xc0400000 (?3.0), r30 = 0x3f800000 (1.0) fdiv r60 r30 r90 r90 0xc0400000 (?3.0) r40 = 0x40400000 (3.0), r60 = 0xc0400000 (?3.0) fdiv r40 r60 r95 r95 0xbf800000 (?1.0) r10 = 0, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e?38) if r10 fdiv r40 r80 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e?38) if r20 fdiv r40 r80 r110 r110 0x7f400000 (2.552117754e38) r40 = 0x40400000 (3.0), r81 = 0x00400000 (5.877471754e?39) fdiv r40 r81 r111 r111 0x7f800000 (+inf), ifz, dbz flags set r82 = 0x00c00000 (1.763241526e?38), r83 = 0x80800000 (?1.175494351e?38) fdiv r82 r83 r112 r112 0xbfc00000 (-1.5) r84 = 0x7f800000 (+inf), r85 = 0xff800000 (?inf) fdiv r84 r85 r113 r113 0xffffffff (qnan), inv flag set r70 = 0x7f7fffff (3.40 2823466e+38) fdiv r70 r70 r120 r120 0x3f800000 (1.0) r80 = 0x00800000 (1.763241526e?38) fdiv r80 r80 r125 r125 0x3f800000 (1.0) r75 = 0x40400000 (3.0), r76 = 0x0 (0.0) fdiv r75 r76 r126 r126 0x7f800000 (+inf), dbz flag set see also fdivflags readpcsw writepcsw fdiv
pnx1300/01/02/11 data book philips semiconductors a-43 preliminary specification ieee status flags from floating-point divide syntax [ if r guard ] fdivflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 / (float)r src2 ) attributes function unit ftough operation code 109 number of operands 2 modifier no modifier range ? latency 17 recovery 16 issue slots 2 description the fdivflags operation computes the ieee exceptions that wo uld result from co mputing t he quotient r src1 r src2 and stores a bit vector representing the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exce ption bits in the pcsw. the ex ception flags in pcsw are left unchanged by this operation. rounding is according to the ieee rounding mode bits in pc sw. if an argument is deno rmalized, zero is substituted before computing the quotient, and the ifz bit in the result is set. if the quotient would be denormalized, the ofz bit in the result is set. the fdivflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x7f7fffff (3. 402823466e+38), r40 = 0x3f800000 (1.0) fdivflags r30 r40 r100 r100 0 r10 = 0, r50 = 0x7f7fffff (3. 402823466e+38) r60 = 0x3e000000 (0.125) if r10 fdivflags r50 r60 r110 no change, since guard is false r20 = 1, r50 = 0x7f7fffff (3. 402823466e+38) r60 = 0x3e000000 (0.125) if r20 fdivflags r50 r60 r111 r111 0xa (ovf inx) r70 = 0x40400000 (3.0), r80 = 0x00400000 (5.877471754e?39) fdivflags r70 r80 r112 r112 0x21 (ifz dbz) r85 = 0x7f800000 (+inf), r86 = 0xff800000 (?inf) fdivflags r85 r86 r113 r113 0x10 (inv) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fdiv faddflags readpcsw fdivflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-44 floating-point compare equal syntax [ if r guard ] feql r src1 r src2 r dest function if r guard then { if (float)r src1 = (float)r src2 then r dest 1 else r dest 0 } attributes function unit fcomp operation code 148 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the feql operation sets the destination register, r dest , to 1 if the first argument, r src1 , is equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as ieee single-precision floating-point values; the result is an integer. if an argument is denormalized, zero is substitute d for the argument before computing the comparison, and the ifz flag in the pcsw is set. if feql causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are st icky: the flags can be set as a side-effect of any floating- point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point compute operations update the pcsw at the same time, the net result in each except ion flag is the logical or of all simu ltaneous updates ored with the existing pcsw value for that exception flag. the feqlflags operation computes the exception flags that would result from an individual feql . the feql operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) feql r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) feql r30 r30 r90 r90 1 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 feql r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 feql r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) feql r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) feql r30 r61 r121 r121 0 r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) feql r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) feql r60 r65 r126 r126 0, ifz flag set r50 = 0x7f800000 (+inf) feql r50 r50 r127 r127 1 see also ieql feqlflags fneq readpcsw writepcsw feql
pnx1300/01/02/11 data book philips semiconductors a-45 preliminary specification ieee status flags from floating-point compare equal syntax [ if r guard ] feqlflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 = (float)r src2 ) attributes function unit fcomp operation code 149 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the feqlflags operation computes the ieee exceptions that would result from computing the comparison r src1 =r src2 and stores a bit vector represen ting the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in th e pcsw. the exception flags in pcsw are left unchanged by this operation. if an argument is denormalized, zero is substituted before co mputing the comparison, and the ifz bit in the result is set. the feqlflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) feqlflags r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) feqlflags r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 feqlflags r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 feqlflags r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) feqlflags r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) feqlflags r30 r61 r121 r121 0 r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) feqlflags r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) feqlflags r60 r65 r126 r126 0x20 (ifz) r50 = 0x7f800000 (+inf) feqlflags r50 r50 r127 r127 0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also feql ieql fgtrflags readpcsw feqlflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-46 floating-point compare greater or equal syntax [ if r guard ] fgeq r src1 r src2 r dest function if r guard then { if (float)r src1 >= (float)r src2 then r dest 1 else r dest 0 } attributes function unit fcomp operation code 146 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fgeq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than or equal to the second argument, r src2 ; otherwise, r dest is set to 0. the argum ents are treated as ieee si ngle-precision floating- point values; the result is an integer. if an argument is denormalized, zero is substi tuted for the argument before computing the comparison, and the ifz flag in the pcsw is set. if fgeq causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occu rs at the same time as r dest is written. if any other fl oating-point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the fgeqflags operation computes the exception flags that would result from an individual fgeq . the fgeq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fgeq r30 r40 r80 r80 1 r30 = 0x40400000 (3.0) fgeq r30 r30 r90 r90 1 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fgeq r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fgeq r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fgeq r30 r60 r120 r120 1 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fgeq r30 r61 r121 r121 0, inv flag set r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fgeq r50 r55 r125 r125 1 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fgeq r60 r65 r126 r126 1, ifz flag set r50 = 0x7f800000 (+inf) fgeq r50 r50 r127 r127 1 see also igeq fgeqflags fgtr readpcsw writepcsw fgeq
pnx1300/01/02/11 data book philips semiconductors a-47 preliminary specification ieee status flags from floating-point compare greater or equal syntax [ if r guard ] fgeqflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 >= (float)r src2 ) attributes function unit fcomp operation code 147 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fgeqflags operation computes the ieee exceptions that would result from computing the comparison r src1> =r src2 and stores a bit vector representing the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in th e pcsw. the exception flags in pcsw are left unchanged by this operation. if an argument is denormalized, zero is substituted before co mputing the comparison, and the ifz bit in the result is set. the fgeqflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fgeqflags r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) fgeqflags r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fgeqflags r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fgeqflags r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fgeqflags r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fgeqflags r30 r61 r121 r121 0x10 (inv) r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fgeqflags r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fgeqflags r60 r65 r126 r126 0x20 (ifz) r50 = 0x7f800000 (+inf) fgeqflags r50 r50 r127 r127 0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fgeq igeq fgtrflags readpcsw fgeqflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-48 floating-point compare greater syntax [ if r guard ] fgtr r src1 r src2 r dest function if r guard then { if (float)r src1 > (float)r src2 then r dest 1 else r dest 0 } attributes function unit fcomp operation code 144 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fgtr operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as ieee single-precision floating-point values; the result is an integer. if an argument is denormalized, zero is substitute d for the argument before computing the comparison, and the ifz flag in the pcsw is set. if fgtr causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are st icky: the flags can be set as a side-effect of any floating- point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point compute operations update the pcsw at the same time, the net result in each except ion flag is the logical or of all simu ltaneous updates ored with the existing pcsw value for that exception flag. the fgtrflags operation computes the exception flags that would result from an individual fgtr . the fgtr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fgtr r30 r40 r80 r80 1 r30 = 0x40400000 (3.0) fgtr r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fgtr r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fgtr r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fgtr r30 r60 r120 r120 1 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fgtr r30 r61 r121 r121 0, inv flag set r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fgtr r50 r55 r125 r125 1 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fgtr r60 r65 r126 r126 1, ifz flag set r50 = 0x7f800000 (+inf) fgtr r50 r50 r127 r127 0 see also igtr fgtrflags fgeq readpcsw writepcsw fgtr
pnx1300/01/02/11 data book philips semiconductors a-49 preliminary specification ieee status flags from floating-point compare greater syntax [ if r guard ] fgtrflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 > (float)r src2 ) attributes function unit fcomp operation code 145 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fgtrflags operation computes the ieee exceptions that would result from computing the comparison r src1> r src2 and stores a bit vector represen ting the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in th e pcsw. the exception flags in pcsw are left unchanged by this operation. if an argument is denormalized, zero is substituted before co mputing the comparison, and the ifz bit in the result is set. the fgtrflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fgtrflags r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) fgtrflags r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fgtrflags r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fgtrflags r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fgtrflags r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fgtrflags r30 r61 r121 r121 0x10 (inv) r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fgtrflags r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fgtrflags r60 r65 r126 r126 0x20 (ifz) r50 = 0x7f800000 (+inf) fgtrflags r50 r50 r127 r127 0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fgtr igtr fgeqflags readpcsw fgtrflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-50 floating-point compare less-than or equal pseudo-op for fgeq syntax [ if r guard ] fleq r src1 r src2 r dest function if r guard then { if (float)r src1 <= (float)r src2 then r dest 1 else r dest 0 } attributes function unit fcomp operation code 146 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fleq operation is a pseudo operation transformed by the scheduler into an fgeq with the arguments exchanged ( fleq ?s r src1 is fgeq ?s r src2 and vice versa). (note: pseudo operations cannot be used in assembly source files.) the fleq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than or equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treate d as ieee single-preci sion floating-point values; the result is an integer. if an argument is denorma lized, zero is substituted for the argument before computing the comparison, and the ifz flag in the pcsw is set. if fleq causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are st icky: the flags can be set as a side-effect of any floating- point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point compute operations update the pcsw at the same time, the net result in each except ion flag is the logical or of all simu ltaneous updates ored with the existing pcsw value for that exception flag. the fleqflags operation computes the exception flags that would result from an individual fleq . the fleq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fleq r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) fleq r30 r30 r90 r90 1 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fleq r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fleq r60 r30 r110 r110 1 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fleq r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fleq r30 r61 r121 r121 0, inv flag set r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fleq r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fleq r60 r65 r126 r126 0, ifz flag set r50 = 0x7f800000 (+inf) fleq r50 r50 r127 r127 1 see also ileq fgeq fleqflags readpcsw writepcsw fleq
pnx1300/01/02/11 data book philips semiconductors a-51 preliminary specification ieee status flags from floating-point compare less-than or equal pseudo-op for fgeqflags syntax [ if r guard ] fleqflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 <= (float)r src2 ) attributes function unit fcomp operation code 147 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fleqflags operation is a pseudo operation transformed by the scheduler into an fgeqflags with the arguments exchanged ( fleqflags ?s r src1 is fgeqflags ?s r src2 and vice versa). (not e: pseudo operations cannot be used in asse mbly source files.) the fleqflags operation computes the ieee exceptions that would result from computing the comparison r src1 <=r src2 and stores a bit vector representing the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in th e pcsw. the exception flags in pcsw are left unchanged by this operation. if an argument is denormalized, zero is substituted before co mputing the comparison, and the ifz bit in the result is set. the fleqflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fleqflags r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) fleqflags r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fleqflags r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fleqflags r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fleqflags r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fleqflags r30 r61 r121 r121 0x10 (inv) r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fleqflags r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fleqflags r60 r65 r126 r126 0x20 (ifz) r50 = 0x7f800000 (+inf) fleqflags r50 r50 r127 r127 0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fleq ileq fgeqflags readpcsw fleqflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-52 floating-point compare less-than pseudo-op for fgtr syntax [ if r guard ] fles r src1 r src2 r dest function if r guard then { if (float)r src1 < (float)r src2 then r dest 1 else r dest 0 } attributes function unit fcomp operation code 144 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fles operation is a pseudo operation transformed by the scheduler into an fgtr with the arguments exchanged ( fles ?s r src1 is fgtr ?s r src2 and vice versa). (note: pseudo ope rations cannot be used in assembly source files.) the fles operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as ieee single-precision floating-point values; the result is an integer. if an argument is denormalized, zero is substitute d for the argument before computing the comparison, and the ifz flag in the pcsw is set. if fles causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are st icky: the flags can be set as a side-effect of any floating- point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point compute operations update the pcsw at the same time, the net result in each except ion flag is the logical or of all simu ltaneous updates ored with the existing pcsw value for that exception flag. the flesflags operation computes the exception flags that would result from an individual fles . the fles operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fles r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) fles r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fles r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fles r60 r30 r110 r110 1 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fles r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fles r30 r61 r121 r121 0, inv flag set r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fles r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fles r60 r65 r126 r126 0, ifz flag set r50 = 0x7f800000 (+inf) fles r50 r50 r127 r127 0 see also iles fgtr flesflags readpcsw writepcsw fles
pnx1300/01/02/11 data book philips semiconductors a-53 preliminary specification ieee status flags from floating-point compare less-than pseudo-op for fgtrflags syntax [ if r guard ] flesflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 < (float)r src2 ) attributes function unit fcomp operation code 145 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the flesflags operation is a pseudo operation transformed by the scheduler into an fgtrflags with the arguments exchanged ( flesflags ?s r src1 is fgtrflags ?s r src2 and vice versa). (not e: pseudo operations cannot be used in asse mbly source files.) the flesflags operation computes the ieee exceptions that would result from computing the comparison r src1 philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-54 floating-point multiply syntax [ if r guard ] fmul r src1 r src2 r dest function if r guard then r dest (float)r src1 (float)r src2 attributes function unit ifmul operation code 28 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description the fmul operation computes the product r src1 r src2 and stores the result into r dest . all values are in ieee single- precision floating-point format. rounding is according to the ieee rounding mode bits in pcsw. if an argument is denormalized, zero is substitu ted for the argument before computing the produ ct, and the ifz flag in the pcsw is set. if the result is denormalized, the re sult is set to zero instead, and th e ofz flag in the pcsw is set. if fmul causes an ieee exception, the corresponding exce ption flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floating-poi nt operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating- point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the exis ting pcsw value for that exception flag. the fmulflags operation computes the exception flags that would result from an individual fmul . the fmul operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r60 = 0xc0400000 (?3.0), r30 = 0x3f800000 (1.0) fmul r60 r30 r90 r90 0xc0400000 (-3.0) r40 = 0x40400000 (3.0), r60 = 0xc0400000 (?3.0) fmul r40 r60 r95 r95 0xc1100000 (-9.0) r10 = 0, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e?38) if r10 fmul r40 r80 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e?38) if r20 fmul r40 r80 r105 r105 0x1400000 (3.52648305e-38) r41 = 0x3f000000 (0.5), r80 = 0x00800000 (1.17549435e?38) fmul r41 r80 r110 r110 0x0, ofz, unf, inx flags set r42 = 0x7f800000 (+inf), r43 = 0x0 (0.0) fmul r42 r43 r106 r106 0xffffffff (qnan), inv flag set r40 = 0x40400000 (3.0), r81 = 0x00400000 (5.877471754e?39) fmul r40 r81 r111 r111 0, ifz flag set r82 = 0x00c00000 (1.763241526e?38), r83 = 0x8080000 (?1.175494351e?38) fmul r82 r83 r112 r112 0, unf, inx flag set r84 = 0x7f800000 (+inf), r85 = 0xff800000 (?inf) fmul r84 r85 r113 r113 0xff800000 (-inf) r70 = 0x7f7fffff (3.40 2823466e+38) fmul r70 r70 r120 r120 0x7f800000, ovf, inx flags set r80 = 0x00800000 (1.763241526e?38) fmul r80 r80 r125 r125 0, unf, inx flag set see also imul umul dspimul dspidualmul fmulflags readpcsw writepcsw fmul
pnx1300/01/02/11 data book philips semiconductors a-55 preliminary specification ieee status flags from floating-point multiply syntax [ if r guard ] fmulflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 (float)r src2 ) attributes function unit ifmul operation code 143 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description the fmulflags operation computes the ieee exceptions that wo uld result from co mputing the product r src1 r src2 and stores a bit vector representing the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exce ption bits in the pcsw. the ex ception flags in pcsw are left unchanged by this operation. rounding is according to the ieee rounding mode bits in pc sw. if an argument is deno rmalized, zero is substituted before computing the product, and the ifz bit in the result is set. if the produ ct would be denormalized, the ofz bit in the result is set. the fmulflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0xc0400000 (?3.0), r30 = 0x3f800000 (1.0) fmulflags r60 r30 r90 r90 0 r40 = 0x40400000 (3.0), r60 = 0xc0400000 (?3.0) fmulflags r40 r60 r95 r95 0 r10 = 0, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e?38) if r10 fmulflags r40 r80 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e?38) if r20 fmulflags r40 r80 r105 r105 0 r41 = 0x3f000000 (0.5), r80 = 0x00800000 (1.17549435e?38) fmulflags r41 r80 r110 r110 0x46 (ofz unf inx) r42 = 0x7f800000 (+inf), r43 = 0x0 (0.0) fmulflags r42 r43 r106 r106 0x10 (inv) r40 = 0x40400000 (3.0), r81 = 0x00400000 (5.877471754e?39) fmulflags r40 r81 r111 r111 0x20 (ifz) r82 = 0x00c00000 (1.763241526e?38), r83 = 0x8080000 (?1.175494351e?38) fmulflags r82 r83 r112 r112 0x06 (unf inx) r84 = 0x7f800000 (+inf), r85 = 0xff800000 (?inf) fmulflags r84 r85 r113 r113 0 r70 = 0x7f7fffff (3. 402823466e+38) fmulflags r70 r70 r120 r120 0x0a (ovf inx) r80 = 0x00800000 (1.763241526e?38) fmulflags r80 r80 r125 r125 0x06 (unf inx) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fmul faddflags readpcsw fmulflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-56 floating-point compare not equal syntax [ if r guard ] fneq r src1 r src2 r dest function if r guard then { if (float)r src1 != (float)r src2 then r dest 1 else r dest 0 } attributes function unit fcomp operation code 150 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fneq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is not equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as ieee single-precision floating-point values; the result is an integer. if an argument is denormalized, zero is substitute d for the argument before computing the comparison, and the ifz flag in the pcsw is set. if fneq causes an ieee exception, the corresponding exception flags in the pcsw are set. the pcsw exception flags are st icky: the flags can be set as a side-effect of any floating- point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point compute operations update the pcsw at the same time, the net result in each except ion flag is the logical or of all simu ltaneous updates ored with the existing pcsw value for that exception flag. the fneqflags operation computes the exception flags that would result from an individual fneq . the fneq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fneq r30 r40 r80 r80 1 r30 = 0x40400000 (3.0) fneq r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fneq r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fneq r60 r30 r110 r110 1 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fneq r30 r60 r120 r120 1 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fneq r30 r61 r121 r121 0 r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fneq r50 r55 r125 r125 1 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fneq r60 r65 r126 r126 1, ifz flag set r50 = 0x7f800000 (+inf) fneq r50 r50 r127 r127 0 see also ineq feql fneqflags readpcsw writepcsw fneq
pnx1300/01/02/11 data book philips semiconductors a-57 preliminary specification ieee status flags from floating-point compare not equal syntax [ if r guard ] fneqflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 != (float)r src2 ) attributes function unit fcomp operation code 151 number of operands 2 modifier no modifier range ? latency 1 issue slots 3 description the fneqflags operation computes the ieee exceptions that would result from computing the comparison r src1 !=r src2 and stores a bit vector represen ting the exception flags into r dest . the argument values are in ieee single-precision floating-point form at; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in th e pcsw. the exception flags in pcsw are left unchanged by this operation. if an argument is denormalized, zero is substituted before co mputing the comparison, and the ifz bit in the result is set. the fneqflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0), r40 = 0 (0.0) fneqflags r30 r40 r80 r80 0 r30 = 0x40400000 (3.0) fneqflags r30 r30 r90 r90 0 r10 = 0, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r10 fneqflags r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x3f800000 (1.0), r30 = 0x40400000 (3.0) if r20 fneqflags r60 r30 r110 r110 0 r30 = 0x40400000 (3.0), r60 = 0x3f800000 (1.0) fneqflags r30 r60 r120 r120 0 r30 = 0x40400000 (3.0), r61 = 0xffffffff (qnan) fneqflags r30 r61 r121 r121 0 r50 = 0x7f800000 (+inf) r55 = 0xff800000 (-inf) fneqflags r50 r55 r125 r125 0 r60 = 0x3f800000 (1.0), r65 = 0x00400000 (5.877471754e-39) fneqflags r60 r65 r126 r126 0x20 (ifz) r50 = 0x7f800000 (+inf) fneqflags r50 r50 r127 r127 0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fneq ineq fleqflags readpcsw fneqflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-58 sign of floating-point value syntax [ if r guard ] fsign r src1 r dest function if r guard then { if (float)r src1 = 0.0 then r dest 0 else if (float)r src1 < 0.0 then r dest 0xffffffff else r dest 1 } attributes function unit fcomp operation code 152 number of operands 1 modifier no modifier range ? latency 1 issue slots 3 description the fsign operation sets the destination register, r dest , to either 0, 1, or ?1 depen ding on the sign of the argument in r src1 . rdest is set to 0 if r src1 is equal to zero, to 1 if r src1 is positive, or to ?1 if r src1 is negative. the argument is treated as an ieee single-precision floati ng-point value; the result is an integer. if the argument is denormalized, zero is substituted before computing the comparison, and t he ifz flag in the pcsw is set; thus, the result of fsign for a denormalized argument is 0. if fsign causes an ieee exception, the corresponding exce ption flags in the pcsw are set. the pcsw exception flags are st icky: the flags can be set as a side-eff ect of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updat es ored with the existing pcsw value for that exception flag. the fsignflags operation computes the exception flags that would result from an individual fsign . the fsign operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0) fsign r30 r100 r100 1 r40 = 0xbf800000 (-1.0) fsign r40 r105 r105 0xffffffff (-1) r50 = 0x80800000 (-1.175494351e-38) fsign r50 r110 r110 0xffffffff (-1) r60 = 0x80400000 (-5.877471754e-39) fsign r60 r115 r115 0, ifz flag set r10 = 0, r70 = 0xffffffff (qnan) if r10 fsign r70 r116 no change, since guard is false r20 = 1, r70 = 0xffffffff (qnan) if r20 fsign r70 r117 r117 0, inv flag set r80 = 0xff800000 (-inf) fsign r80 r120 r120 0xffffffff (-1) see also fsignflags readpcsw writepcsw fsign
pnx1300/01/02/11 data book philips semiconductors a-59 preliminary specification ieee status flags from floating-point sign syntax [ if r guard ] fsignflags r src1 r dest function if r guard then r dest ieee_flags(s ign((float)r src1 )) attributes function unit fcomp operation code 153 number of operands 1 modifier no modifier range ? latency 1 issue slots 3 description the fsignflags operation computes the ieee e xceptions that would result fr om computing the sign of r src1 and stores a bit vector representing the exception flags into r dest . the argument value is in ieee single-precision floating- point format; the result is an integer bit vector. the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. if the argument is denormalized, zero is substituted before computing the sign, and the ifz bit in the result is set. the fsignflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0) fsignflags r30 r100 r100 0 r40 = 0xbf800000 (-1.0) fsignflags r40 r105 r105 0 r50 = 0x80800000 (-1.175494351e-38) fsignflags r50 r110 r110 0 r60 = 0x80400000 (-5.877471754e-39) fsignflags r60 r115 r115 0x20 (ifz) r10 = 0, r70 = 0xffffffff (qnan) if r10 fsignflags r70 r116 no change, since guard is false r20 = 1, r70 = 0xffffffff (qnan) if r20 fsignflags r70 r117 r117 0x10 (inv) r80 = 0xff800000 (-inf) fsignflags r80 r120 r120 0 ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fsign readpcsw fsignflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-60 floating-point square root syntax [ if r guard ] fsqrt r src1 r dest function if r guard then r dest square_root(r src1 ) attributes function unit ftough operation code 110 number of operands 1 modifier no modifier range ? latency 17 recovery 16 issue slots 2 description the fsqrt operation computes the squareroot of r src1 and stores the result into r dest . all values are in ieee single-precision floating-point format. ro unding is according to the ieee rounding mode bits in pcsw. if an argument is denormalized, zero is substituted for the argument befo re computing the squareroot, and the ifz flag in the pcsw is set. if the result is d enormalized, the result is set to zero instea d, and the ofz flag in the pcsw is set. if fsqrt causes an ieee exception, the corres ponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floa ting-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw except ion flags occurs at the same time as r dest is written. if any other floating-point compute op erations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the fsqrtflags operation computes the exception flags that would result from an individual fsqrt . the fsqrt operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r60 = 0xc0400000 (?3.0) fsqrt r60 r90 r90 0xffffffff (qnan), inv flag set r40 = 0x40400000 (3.0) fsqrt r40 r95 r95 0x3fddb3d7 (1.732051), inx flag set r10 = 0, r40 = 0x40400000 (3.0) if r10 fsqrt r40 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0) if r20 fsqrt r40 r110 r110 0x3fddb3d7 (1.732051), inx flag set r82 = 0x00c00000 (1.763241526e?38) fsqrt r82 r112 r112 0x201cc471 (1.32787105e-19), inx flag set r84 = 0x7f800000 (+inf) fsqrt r84 r113 r113 0x7f800000 (+inf) r70 = 0x7f7fffff (3.40 2823466e+38) fsqrt r70 r120 r120 0x5f7fffff (1.84 46743e19), inx flag set r80 = 0x00400000 (5.877471754e-39) fsqrt r80 r125 r125 0, ifz flag set see also fsqrtflags readpcsw writepcsw fsqrt
pnx1300/01/02/11 data book philips semiconductors a-61 preliminary specification ieee status flags from fl oating-point square root syntax [ if r guard ] fsqrtflags r src1 r dest function if r guard then r dest ieee_flags(square_root((float)r src1 )) attributes function unit ftough operation code 111 number of operands 1 modifier no modifier range ? latency 17 recovery 16 issue slots 2 description the fsqrtflags operation computes th e ieee exceptions that would result from comp uting the squareroot of r src1 and stores a bit vector representing the exception flags into r dest . the argument value is in ieee single- precision floating-point format; th e result is an integer bit vect or. the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. t he exception flags in pcsw are left un changed by this o peration. rounding is according to the ieee rounding mode bits in pcsw. if th e argument is denormalized, zero is substituted before computing the squareroot, and the ifz bit in the result is set. if the result is denormalized, and the ofz flag in the pcsw is set. the fsqrtflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0xc0400000 (?3.0) fsqrtflags r60 r90 r90 0x10 (inv) r40 = 0x40400000 (3.0) fsqrtflags r40 r95 r95 0x2 (inx) r10 = 0, r40 = 0x40400000 (3.0) if r10 fsqrtflags r40 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0) if r20 fsqrtflags r40 r110 r110 0x2 (inx) r82 = 0x00c00000 (1.763241526e?38) fsqrtflags r82 r112 r112 0x2 (inx) r84 = 0x7f800000 (+inf) fsqrtflags r84 r113 r113 0 r70 = 0x7f7fffff (3. 402823466e+38) fsqrtflags r70 r120 r120 0x2 (inx) r80 = 0x00400000 (5.877471754e-39) fsqrtflags r80 r125 r125 0x20 (ifz) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fsqrt readpcsw fsqrtflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-62 floating-point subtract syntax [ if r guard ] fsub r src1 r src2 r dest function if r guard then r dest (float)r src1 ? (float)r src2 attributes function unit falu operation code 113 number of operands 2 modifier no modifier range ? latency 3 issue slots 1, 4 description the fsub operation computes the difference r src1 ?r src2 and writes the result into r dest . all values are in ieee single-precision floating-point format. ro unding is according to the ieee rounding mode bits in pcsw. if an argument is denormalized, zero is substituted fo r the argument before computi ng the difference, and the ifz flag in the pcsw is set. if the result is denormalized, the result is set to zero instead, and the ofz flag in the pcsw is set. if fsub causes an ieee exception, the corres ponding exception flags in th e pcsw are set. the pcsw ex ception flags are sticky: the flags can be set as a side-effect of any floating-poi nt operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating- point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the exis ting pcsw value for that exception flag. the fsubflags operation computes the exception flags that would result from an individual fsub . the fsub operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r60 = 0xc0400000 (?3.0), r30 = 0x3f800000 (1.0) fsub r60 r30 r90 r90 0xc0800000 (-4.0) r40 = 0x40400000 (3.0), r60 = 0xc0400000 (?3.0) fsub r40 r60 r95 r95 0x40c00000 (6.0) r10 = 0, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e-38) if r10 fsub r40 r80 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e-38) if r20 fsub r40 r80 r110 r110 0x40400000 (3.0), inx flag set r40 = 0x40400000 (3.0), r81 = 0x00400000 (5.877471754e?39) fsub r40 r81 r111 r111 0x40400000 (3.0), ifz flag set r82 = 0x00c00000 (1.763241526e-38), r83 = 0x0080000 (1.175494351e-38) fsub r82 r83 r112 r112 0x0, ofz, unf and inx flags set r84 = 0x7f800000 (+inf), r85 = 0x7f800000 (+inf) fsub r84 r85 r113 r113 0xffffffff (qnan), inv flag set r70 = 0x7f7fffff (3.40 2823466e+38) r86 = 0xff7fffff (-3. 402823466e+38) fsub r70 r86 r120 r120 0x7f800000 (+inf), ovf, inx flag set r87 = 0xffffffff (qnan)) r30 = 0x3f800000 (1.0 fsub r87 r30 r125 r125 0xffffffff (qnan) r87 = 0xffbfffff (snan)) r30 = 0x3f800000 (1.0 fsub r87 r30 r125 r125 0xffffffff (qnan), inv flag set r83 = 0x0080001 (1.175494421e-38), r89 = 0x0080000 (1.175494351e-38) fsub r83 r89 r126 r126 0x0, ofz, unf and inx flags set see also fsubflags isub dspisub dspidualsub readpcsw writepcsw fsub
pnx1300/01/02/11 data book philips semiconductors a-63 preliminary specification ieee status flags from floating-point subtract syntax [ if r guard ] fsubflags r src1 r src2 r dest function if r guard then r dest ieee_flags((float)r src1 ? (float)r src2 ) attributes function unit falu operation code 114 number of operands 2 modifier no modifier range ? latency 3 issue slots 1, 4 description the fsubflags operation computes the ieee exceptions that would result from co mputing the difference r src1 ? r src2 and writes a bit vector representing the exception flags into r dest . the argument values are in ieee single- precision floating-point format; th e result is an integer bit vect or. the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. t he exception flags in pcsw are left un changed by this o peration. rounding is according to the ieee rounding mode bi ts in pcsw. if an argument is denor malized, zero is substitute d before computing the difference, and the ifz bit in the result is set. if the difference would be denormalized, the ofz bit in the result is set. the fsubflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0xc0400000 (?3.0), r30 = 0x3f800000 (1.0) fsubflags r60 r30 r90 r90 0 r40 = 0x40400000 (3.0), r60 = 0xc0400000 (?3.0) fsubflags r40 r60 r95 r95 0 r10 = 0, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e-38) if r10 fsubflags r40 r80 r100 no change, since guard is false r20 = 1, r40 = 0x40400000 (3.0), r80 = 0x00800000 (1.17549435e-38) if r20 fsubflags r40 r80 r110 r110 0x2 (inx) r40 = 0x40400000 (3.0), r81 = 0x00400000 (5.877471754e?39) fsubflags r40 r81 r111 r111 0x20 (ifz) r82 = 0x00c00000 (1.763241526e-38), r83 = 0x0080000 (1.175494351e-38) fsubflags r82 r83 r112 r112 0x40 (ofz) r84 = 0x7f800000 (+inf), r85 = 0x7f800000 (+inf) fsubflags r84 r85 r113 r113 0x10 (inv) r70 = 0x7f7fffff (3. 402823466e+38) r86 = 0xff7fffff (-3. 402823466e+38) fsubflags r70 r86 r120 r120 0xa (ovf,inx) r87 = 0xffffffff (qnan)) r30 = 0x3f800000 (1.0 fsubflags r87 r30 r125 r125 0x0 r87 = 0xffbfffff (snan)) r30 = 0x3f800000 (1.0 fsubflags r87 r30 r125 r125 0x10 (inv) r83 = 0x0080001 (1.175494421e-38), r89 = 0x0080000 (1.175494351e-38) fsubflags r83 r89 r126 r126 0x4 (unf) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also fsub faddflags readpcsw fsubflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-64 funnel-shift 1byte syntax [ if r guard ] funshift1 r src1 r src2 r dest function if r guard then r dest <31:8> r src1 <23:0> r dest <7:0> r src2 <31:24> attributes function unit shifter operation code 99 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the funshift1 operation effectively shifts left by one byte the 64-bit concatenation of r src1 and r src2 and writes the most-significant 32 bits of the shifted result to r dest . the funshift1 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xaabbccdd, r40 = 0x11223344 funshift1 r30 r40 r50 r50 0xbbccdd11 r10 = 0, r40 = 0x11223344, r30 = 0xaabbccdd if r10 funshift1 r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0x11223344, r30 = 0xaabbccdd if r20 funshift1 r40 r30 r70 r70 0x223344aa 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest see also funshift2 funshift3 rol funshift1
pnx1300/01/02/11 data book philips semiconductors a-65 preliminary specification funnel-shift 2 bytes syntax [ if r guard ] funshift2 r src1 r src2 r dest function if r guard then r dest <31:16> r src1 <15:0> r dest <15:0> r src2 <31:16> attributes function unit shifter operation code 100 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the funshift2 operation effectively shifts left by two bytes the 64-bit concatenation of r src1 and r src2 and writes the most-significant 32 bits of the shifted result to r dest . the funshift2 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xaabbccdd, r40 = 0x11223344 funshift2 r30 r40 r50 r50 0xccdd1122 r10 = 0, r40 = 0x11223344, r30 = 0xaabbccdd if r10 funshift2 r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0x11223344, r30 = 0xaabbccdd if r20 funshift2 r40 r30 r70 r70 0x3344aabb 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest see also funshift1 funshift3 rol funshift2
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-66 funnel-shift 3 bytes syntax [ if r guard ] funshift3 r src1 r src2 r dest function if r guard then r dest <31:24> r src1 <7:0> r dest <23:0> r src2 <31:8> attributes function unit shifter operation code 101 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the funshift3 operation effectively shifts left by three bytes the 64-bit concatenation of r src1 and r src2 and writes the most-significant 32 bits of the shifted result to r dest . the funshift3 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xaabbccdd, r40 = 0x11223344 funshift3 r30 r40 r50 r50 0xdd112233 r10 = 0, r40 = 0x11223344, r30 = 0xaabbccdd if r10 funshift3 r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0x11223344, r30 = 0xaabbccdd if r20 funshift3 r40 r30 r70 r70 0x44aabbcc 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest see also funshift1 funshift2 rol funshift3
pnx1300/01/02/11 data book philips semiconductors a-67 preliminary specification clipped signed absolute value syntax [ if r guard ] h_dspiabs r0 r src2 r dest function if r guard then { if r src2 >= 0 then r dest r src2 else if rsrc2 = 0x80000000 then r dest 0x7fffffff else r dest ?r src2 } attributes function unit dspalu operation code 65 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the h_dspiabs operation computes the absolute value of rsrc2, clips the result into the range [0x0..0x7fffffff], and stores the clipped value into r dest . all values are signed integers. this operation requires a zero as first argument. the programmer is advised to use the unary pseudo operation dspiabs instead. the h_dspiabs operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffffffff h_dspiabs r0 r30 r60 r60 0x00000001 r10 = 0, r40 = 0x80000001 if r10 h_dspiabs r0 r40 r70 no change, since guard is false r20 = 1, r40 = 0x80000001 if r20 h_dspiabs r0 r40 r100 r100 0x7fffffff r50 = 0x80000000 h_dspiabs r0 r50 r80 r80 0x7fffffff r90 = 0x7fffffff h_dspiabs r0 r90 r110 r110 0x7fffffff see also h_dspiabs dspidualabs dspiadd dspimul dspisub dspuadd dspumul dspusub h_dspiabs
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-68 dual clipped absolute val ue of signed 16-bit halfwords syntax [ if r guard ] h_dspidualabs r0 rsrc2 r dest function if r guard then { temp1 sign_ext16to32(rsrc2<15:0>) temp2 sign_ext16to32(rsrc2<31:16>) if temp1 = 0xffff 8000 then temp1 0x7fff if temp2 = 0xffff 8000 then temp2 0x7fff if temp1 < 0 then temp1 ?temp1 if temp2 < 0 then temp2 ?temp2 r dest <31:16> temp2<15:0> r dest <15:0> temp1<15:0> } attributes function unit dspalu operation code 72 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the h_ dspidualabs operation performs two 16-bit clipped, signed absolute value computations separately on the high and low 16-bit halfwords of rsrc2. both absolute values are clipped into the range [0x0..0x7fff] and written into the corresponding halfwords of r dest . all values are signed 16-bit integers. this operation requires a zero as first argument. the programmer is advised to use the dspidualabs pseudo operation instead. the h_ dspidualabs operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffff0032 h_dspidualabs r0 r30 r60 r60 0x00010032 r10 = 0, r40 = 0x80008001 if r10 h_dspidualabs r0 r40 r70 no change, since guard is false r20 = 1, r40 = 0x80008001 if r20 h_dspidualabs r0 r40 r100 r100 0x7fff7fff r50 = 0x0032ffff h_dspidualabs r0 r50 r80 r80 0x00320001 r90 = 0x7fffffff h_dspidualabs r0 r90 r110 r110 0x7fff0001 see also dspidualabs dspiabs dspidualadd dspidualmul dspidualsub dspiabs h_dspidualabs
pnx1300/01/02/11 data book philips semiconductors a-69 preliminary specification hardware absolute value syntax [ if r guard ] h_iabs r0 rsrc2 r dest function if r guard then { if rsrc2 < 0 then r dest ?rsrc2 else r dest rsrc2 } attributes function unit alu operation code 44 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the h_ iabs operation computes the absolute value of rsrc2 and stores the result into r dest . the argument is a signed integer; the result is an unsigned integer. this op eration requires a zero as first argument. the programmer is advised to use the iabs pseudo operation instead. the h_iabs operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffffffff h_iabs r0 r30 r60 r60 0x00000001 r10 = 0, r40 = 0xfffffff4 if r10 h_iabs r0 r40 r80 no change, since guard is false r20 = 1, r40 = 0xfffffff4 if r20 h_iabs r0 r40 r90 r90 0xc r50 = 0x80000001 h_iabs r0 r50 r100 r100 0x7fffffff r60 = 0x80000000 h_iabs r0 r60 r110 r110 0x80000000 r20 = 1 h_iabs r0 r20 r120 r120 1 see also iabs fabsval h_iabs
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-70 hardware 16-bit store with displacement syntax [ if r guard ] h_st16d( d ) r src1 r src2 function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 mem[r src2 + d + (1 bs)] r src1 <7:0> mem[r src2 + d + (0 bs)] r src1 <15:8> } attributes function unit dmem operation code 30 number of operands 2 modifier 7 bits modifier range ?128..126 by 2 latency n/a issue slots 4, 5 description the h_st16d operation stores the least-si gnificant 16-bit halfword of r src1 into the memory locations pointed to by the address in r src2 + d . the d value is an opcode modifier, must be in th e range ?128 and 126 inclusive, and must be a multiple of 2. this store operation is performed as littl e-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. if h_st16d is misaligned (the memory address computed by r src2 + d is not a multiple of 2), the result of h_st16d is undefined, and the mse (misaligned store exception) bit in the pcsw register is set to 1. additionally, if the trpmse (trap on misaligned store ex ception) bit in pcsw is 1, except ion processing will be requested on the next interruptible jump. the h_st16d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory locations (and the modi fication of cache if the locations are cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, h_st16d has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xcfe, r80 = 0x44332211 h_st16d(2) r80 r10 [0xd00] 0x22, [0xd01] 0x11 r50 = 0, r20 = 0xd05, r70 = 0xaabbccdd if r50 h_st16d(?4) r70 r20 no change, since guard is false r60 = 1, r30 = 0xd06, r70 = 0xaabbccdd if r60 h_st16d(?4) r70 r30 [0xd02] 0xcc, [0xd03] 0xdd see also st16 st16d st8 st8d st32 st32d readpcsw ijmpf h_st16d
pnx1300/01/02/11 data book philips semiconductors a-71 preliminary specification hardware 32-bit store with displacement syntax [ if r guard ] h_st32d( d ) r src1 r src2 function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 mem[r src2 + d + (3 bs)] r src1 <7:0> mem[r src2 + d + (2 bs)] r src1 <15:8> mem[r src2 + d + (1 bs)] r src1 <24:16> mem[r src2 + d + (0 bs)] r src1 <31:24> } attributes function unit dmem operation code 31 number of operands 2 modifier 7 bits modifier range ?256..252 by 4 latency n/a issue slots 4, 5 description the h_st32d operation stores all 32 bits of r src1 into the memory locations pointed to by the address in r src2 + d . the d value is an opcode modifier, must be in the range ?256 and 252 inclusive, and must be a multiple of 4. this store operation is performed as little-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. if h_st32d is misaligned (the memory address computed by r src2 + d is not a multiple of 4), the result of h_st32d is undefined, and the mse (misaligned store exception) bi t in the pcsw register is set to 1. additionally, if the trpmse (trap on misaligned store ex ception) bit in pcsw is 1, except ion processing will be requested on the next interruptible jump. the h_st32d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory lo cations (and the modification of cache if the locations are cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, h_st32d has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xcfc, r80 = 0x44332211 h_st32d(4) r80 r10 [0xd00] 0x44, [0xd01] 0x33, [0xd02] 0x22, [0xd03] 0x11 r50 = 0, r20 = 0xd0b, r70 = 0xaabbccdd if r50 h_st32d(?8) r70 r20 no change, since guard is false r60 = 1, r30 = 0xd0c, r70 = 0xaabbccdd if r60 h_st32d(?8) r70 r30 [0xd04] 0xaa, [0xd05] 0xbb, [0xd06] 0xcc, [0xd07] 0xdd see also st32 st32d st16 st16d st8 st8d readpcsw ijmpf h_st32d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-72 hardware 8-bit stor e with displacement syntax [ if r guard ] h_st8d( d ) r src1 r src2 function if r guard then mem[r src2 + d ] r src1 <7:0> attributes function unit dmem operation code 29 number of operands 2 modifier 7 bits modifier range ?64..63 latency n/a issue slots 4, 5 description the h_st8d operation stores the least- significant 8-bit byte of r src1 into the memory locati on pointed to by the address formed from the sum r src2 + d . the value of the opcode modifier d must be in the range -64 and 63 inclusive. this operation does not depend on the bytesex bit in the pcsw since only a single byte is stored. the h_st8d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory lo cation (and the modification of cache if the location is cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, h_st8d has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xd00, r80 = 0x44332211 h_st8d(3) r80 r10 [0xd03] 0x11 r50 = 0, r20 = 0xd01, r70 = 0xaabbccdd if r50 h_st8d(-4) r70 r20 no change, since guard is false r60 = 1, r30 = 0xd02, r70 = 0xaabbccdd if r60 h_st8d(-4) r70 r30 [0xcfe] 0xdd see also st8 st8d st16 st16d st32 st32d h_st8d
pnx1300/01/02/11 data book philips semiconductors a-73 preliminary specification read clock cycle counter, most-significant word syntax [ if r guard ] hicycles r dest function if r guard then r dest cccount<63:32> attributes function unit fcomp operation code 155 number of operands 0 modifier no modifier range ? latency 1 issue slots 3 description refer to section 3.1.5, ?cccount?clock cycle counter? for a description of th e cccount operation. the hicycles operation copies the high 32 bits of the slav e register clock cycle counter (cccount) to the destination register, r dest . the contents of the master counter are transfer red to the slave ccco unt register only on a successful interruptible jump and on processor reset. thus, if cycles and hicycles are executed without intervening interruptible jumps, the operation pair is guaranteed to be a coherent sample of the master clock-cycle counter. the master counter increments on all cycles (proce ssor-stall and non-stall) if pcsw.cs = 1; otherwise, the counter increments only on non-stall cycles. the hi cycles operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result cccount_hr = 0xabcdefff12345678 hicycles r60 r60 0xabcdefff r10 = 0, cccount_hr = 0xabcdefff12345678 if r10 hicycles r70 no change, since guard is false r20 = 1, cccount_hr = 0xabcdefff12345678 if r20 hicycles r100 r100 0xabcdefff see also cycles curcycles writepcsw hicycles
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-74 absolute value pseudo-op for h_iabs syntax [ if r guard ] iabs r src1 r dest function if r guard then { if rsrc1 < 0 then r dest ?rsrc1 else r dest rsrc1 } attributes function unit alu operation code 44 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the iabs operation is a pseudo operation transformed by the scheduler into an h_iabs with zero as the first argument and a second argument equal to the iabs argument. (note: pseudo operations cannot be used in assembly source files.) the iabs operation computes the absolute value of r src1 and stores the result into r dest . the argument is a signed integer; the result is an unsigned integer. the iabs operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffffffff iabs r30 r60 r60 0x00000001 r10 = 0, r40 = 0xfffffff4 if r10 iabs r40 r80 no change, since guard is false r20 = 1, r40 = 0xfffffff4 if r20 iabs r40 r90 r90 0xc r50 = 0x80000001 iabs r50 r100 r100 0x7fffffff r60 = 0x80000000 iabs r60 r110 r110 0x80000000 r20 = 1 iabs r20 r120 r120 1 see also h_iabs dspiabs dspidualabs fabsval iabs
pnx1300/01/02/11 data book philips semiconductors a-75 preliminary specification signed add syntax [ if r guard ] iadd r src1 r src2 r dest function if r guard then r dest r src1 + r src2 attributes function unit alu operation code 12 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the iadd operation computes the sum r src1 +r src2 and stores the result into r dest . the operands can be either both signed or unsigned integers. no over flow or underflow detection is performed. the iadd operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0x100 iadd r60 r60 r80 r80 0x200 r10 = 0, r60 = 0x100, r30 = 0xf11 if r10 iadd r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r30 = 0xf11 if r20 iadd r60 r30 r90 r90 0x1011 r70 = 0xffffff00, r40 = 0xffffff9c iadd r70 r40 r100 r100 0xfffffe9c see also iaddi carry dspiadd dspidualadd fadd iadd
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-76 add with immediate syntax [ if r guard ] iaddi( n ) r src1 r dest function if r guard then r dest r src1 + n attributes function unit alu operation code 5 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the iaddi operation sums a single argument in r src1 and an immediate modifier n and stores the result in r dest . the value of n must be between 0 and 127, inclusive. the iaddi operations optionally take a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0xf11 iaddi(127) r30 r70 r70 0xf90 r10 = 0, r40 = 0xffffff9c if r10 iaddi(1) r40 r80 no change, since guard is false r20 = 1, r40 = 0xffffff9c if r20 iaddi(1) r40 r90 r90 0xffffff9d r50 = 0x1000 iaddi(15) r50 r120 r120 0x100f r60 = 0xfffffff0 iaddi(2) r60 r110 r110 0xfffffff2 r60 = 0xfffffff0 iaddi(17) r60 r120 r120 1 see also iadd carry iaddi
pnx1300/01/02/11 data book philips semiconductors a-77 preliminary specification signed average syntax [ if r guard ] iavgonep r src1 r src2 r dest function if r guard then r dest (sign_ext32to64(r src1 ) + sign_ext32to64(r src2 ) + 1) >> 1; attributes function unit dspalu operation code 25 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the iavgonep operation returns the average of the two arguments. this operation computes the sum r src1 +r src2 +1, shifts the sum right by 1 bit, and stores the result into r dest . the operands are signed integers. the iavgonep operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0x10, r70 = 0x20 iavgonep r60 r70 r80 r80 0x18 r10 = 0, r60 = 0x10, r30 = 0x20 if r10 iavgonep r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x9, r30 = 0x20 if r20 iavgonep r60 r30 r90 r90 0x15 r70 = 0xfffffff7, r40 = 0x2 iavgonep r70 r40 r100 r100 0xfffffffd r70 = 0xfffffff7, r40 = 0x3 iavgonep r70 r40 r100 r100 0xfffffffd 0 31 r src1 0 31 r src2 0 31 r dest + 0 32 full precision 33-bit result s s shift down one bit 1 signed signed signed signed see also quadavg iadd iavgonep
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-78 signed select byte syntax [ if r guard ] ibytesel r src1 r src2 r dest function if r guard then { if r src2 = 0 then r dest sign_ext8to32(r src1 <7:0>) else if r src2 = 1 then r dest sign_ext8to32(r src1 <15:8>) else if r src2 = 2 then r dest sign_ext8to32(r src1 <23:16>) else if r src2 = 3 then r dest sign_ext8to32(r src1 <31:24>) } attributes function unit alu operation code 56 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the ibytesel operation selects one byte from the argument, r src1 , sign-extends the byte to 32 bits, and stores the result in r dest . the value of r src2 determines which byte is selected, with r src2 =0 selecting the lsb of r src1 and r src2 =3 selecting the msb of r src1 . if rsrc2 is not between 0 and 3 inclusive, the result of ibytesel is undefined. the ibytesel operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x44332211, r40 = 1 ibytesel r30 r40 r50 r50 0x00000022 r10 = 0, r60 = 0xddccbbaa, r70 = 2 if r10 ibytesel r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0xddccbbaa, r70 = 2 if r20 ibytesel r60 r70 r90 r90 0xffffffcc r100 = 0xff ffff7f, r110 = 0 ibytesel r100 r110 r120 r120 0x0000007f 0 15 31 r src1 0 31 r src2 23 7 1 0 0 31 r dest 7 7 s s s s s s s s s s s s s s s s s s s s s s s s s s 32 10 signed signed signed signed signed signed see also ubytesel sex8 packbytes ibytesel
pnx1300/01/02/11 data book philips semiconductors a-79 preliminary specification clip signed to signed syntax [ if r guard ] iclipi r src1 r src2 r dest function if r guard then r dest min(max(r src1 , ?r src2 ?1), r src2 ) attributes function unit dspalu operation code 74 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the iclipi operation returns the value of r src1 clipped into the unsigned integer range (?r src2 ?1) to r src2 , inclusive. the argument r src1 is considered a signed integer; rsrc2 is c onsidered an unsigned integer and must have a value between 0 and 0x7fffffff inclusive. the iclipi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x80, r40 = 0x7f iclipi r30 r40 r50 r50 0x7f r10 = 0, r60 = 0x12345678, r70 = 0xabc if r10 iclipi r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x12345678, r70 = 0xabc if r20 iclipi r60 r70 r90 r90 0xabc r100 = 0x80000000, r110 = 0x3fffff iclipi r100 r110 r120 r120 0xffc00000 see also uclipi uclipu imin imax iclipi
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-80 invalidate all instruction cache blocks syntax [ if r guard ] iclr function if r guard then { block 0 for all blocks in instruction cache { icache_reset_valid_block(block) block block + 1 } } attributes function unit branch operation code 184 number of operands 0 modifier no modifier range ? latency n/a issue slots 2, 3, 4 description the iclr operation resets the valid bits of all blocks in the instruction cache. iclr does clear the valid bits of locked blocks. iclr does not change the replacement status of instruction-cache blocks. iclr ensures coherency between caches and main memory by discarding all pending prefetch operations. the side effect time behavior of iclr is such that if instruction i performs an iclr, instructions i, i+1, i+2 will be included in the discard from the instruction cache, but i+3 will be retained. the iclr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result iclr r10 = 0 if r10 iclr no change and no stall cycles, since guard is false r20 = 1 if r20 iclr see also dcb dinvalid iclr
pnx1300/01/02/11 data book philips semiconductors a-81 preliminary specification identity pseudo-op for iadd syntax [ if r guard ] ident r src1 r dest function if r guard then r dest r src1 attributes function unit alu operation code 12 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ident operation is a pseudo operation transformed by the scheduler into an iadd with r0 (always contains 0) as the first argument and r src1 as the second. (note: pseudo operations cannot be used in assembly source files.) the ident operation copies the argument r src1 to r dest . it is used by the instruction scheduler to implement register to register copying. the ident operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x100 ident r30 r40 r40 0x100 r10 = 0, r50 = 0x12345678 if r10 ident r50 r60 no change, since guard is false r20 = 1, r50 = 0x12345678 if r20 ident r50 r70 r70 0x12345678 see also iadd ident
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-82 signed compare equal syntax [ if r guard ] ieql r src1 r src2 r dest function if r guard then { if r src1 = r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 37 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ieql operation sets the destination register, r dest , to 1 if the first argument, r src1 , is equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the ieql operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ieql r30 r40 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 3 if r10 ieql r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x1000 if r20 ieql r50 r60 r90 r90 1 r70 = 0x80000000, r40 = 4 ieql r70 r40 r100 r100 0 r70 = 0x80000000 ieql r70 r70 r110 r110 1 see also igeq ueql ieqli ineq ieql
pnx1300/01/02/11 data book philips semiconductors a-83 preliminary specification signed compare equal with immediate syntax [ if r guard ] ieqli( n ) r src1 r dest function if r guard then { if r src1 = n then r dest 1 else r dest 0 } attributes function unit alu operation code 4 number of operands 1 modifier 7 bits modifier range ?64..63 latency 1 issue slots 1, 2, 3, 4, 5 description the ieqli operation sets the destination register, r dest , to 1 if the first argument, r src1 , is equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the ieqli operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ieqli(2) r30 r80 r80 0 r30 = 3 ieqli(3) r30 r90 r90 1 r30 = 3 ieqli(4) r30 r100 r100 0 r10 = 0, r40 = 0x100 if r10 ieqli(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ieqli(63) r40 r100 r100 0 r60 = 0xffffffc0 ieqli(-64) r60 r120 r120 1 see also ieql igeqi ueqli ineqi ieqli
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-84 sum of products of signed 16-bit halfwords syntax [ if r guard ] ifir16 r src1 r src2 r dest function if r guard then r dest sign_ext16to32(r src1 <31:16>) sign_ext16to32(r src2 <31:16>) + sign_ext16to32(r src1 <15:0>) sign_ext16to32(r src2 <15:0>) attributes function unit dspmul operation code 93 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the ifir16 operation computes two separate products of the two pairs of corresponding 16-bit halfwords of r src1 and r src2 ; the two products are summed, and the result is written to r dest . all values are considered signed; thus, the intermediate products and the final sum of products are signed. all intermediate computations are performed without loss of precision; the final sum of products is clipped into the range [0x80000000..0x7fffffff] before being written into r dest . the ifir16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x00020003, r40 = 0x00010002 ifir16 r30 r40 r50 r50 0x8 r10 = 0, r60 = 0xff9c0064, r70 = 0x0064ff9c if r10 ifir16 r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0xff9c0064, r70 = 0x0064ff9c if r20 ifir16 r60 r70 r90 r90 0xffff b1e0 r30 = 0x00020003, r70 = 0x0064ff9c ifir16 r30 r70 r100 r100 0xffffff9c 0 15 31 r src1 0 15 31 r src2 0 31 r dest + signed signed signed signed signed 0 32 clip to [2 31 ?1..?2 31 ] full-precision 33-bit result signed see also ifir8ii ifir8ui ufir8uu ifir16 ifir16
pnx1300/01/02/11 data book philips semiconductors a-85 preliminary specification signed sum of produc ts of signed bytes syntax [ if r guard ] ifir8ii r src1 r src2 r dest function if r guard then r dest sign_ext8to32(r src1 <31:24>) sign_ext8to32(r src2 <31:24>) + sign_ext8to32(r src1 <23:16>) sign_ext8to32(r src2 <23:16>) + sign_ext8to32(r src1 <15:8>) sign_ext8to32(r src2 <15:8>) + sign_ext8to32(r src1 <7:0>) sign_ext8to32(r src2 <7:0>) attributes function unit dspmul operation code 92 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the ifir8ii operation computes four separate products of the four pairs of corresponding 8-bit bytes of r src1 and r src2 ; the four products are summed, and the result is written to r dest . all values are considered signed; thus, the intermediate products and the final sum of products are signed. all computations are performed without loss of precision. the ifir8ii operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r70 = 0x0afb14f6, r30 = 0x0a0a1414 ifir8ii r70 r30 r90 r90 0xfa r10 = 0, r70 = 0x 0afb14f6, r30 = 0x0a0a1414 if r10 ifir8ii r70 r30 r100 no change, since guard is false r20 = 1, r80 = 0x649c649c, r40 = 0x9c649c64 if r20 ifir8ii r80 r40 r110 r110 0xffff63c0 r50 = 0x80808080, r60 = 0xf fffffff ifir8ii r50 r60 r120 r120 0x200 0 15 31 r src1 0 15 31 r src2 0 31 r dest + 23 7 23 7 signed signed signed signed signed signed signed signed signed see also ifir8ui ufir8uu ifir16 ufir16 ifir8ii
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-86 signed sum of products of unsigned/signed bytes syntax [ if r guard ] ifir8ui r src1 r src2 r dest function if r guard then r dest zero_ext8to32(r src1 <31:24>) sign_ext8to32(r src2 <31:24>) + zero_ext8to32(r src1 <23:16>) sign_ext8to32(r src2 <23:16>) + zero_ext8to32(r src1 <15:8>) sign_ext8to32(r src2 <15:8>) + zero_ext8to32(r src1 <7:0>) sign_ext8to32(r src2 <7:0>) attributes function unit dspmul operation code 91 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the ifir8ui operation computes four separate products of the four pairs of corresponding 8-bit bytes of r src1 and r src2 ; the four products are summed, and the result is written to r dest . the bytes from r src1 are considered unsigned, but the bytes from r src2 are considered signed; thus, the intermediate products and the final sum of products are signed. all computatio ns are performed without loss of precision. the ifir8ui operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r70 = 0x0afb14f6, r30 = 0x0a0a1414 ifir8ui r30 r70 r90 r90 0xfa r10 = 0, r70 = 0x0afb14f6, r30 = 0x0a0a1414 if r10 ifir8ui r30 r70 r100 no change, since guard is false r20 = 1, r80 = 0x649c649c, r40 = 0x9c649c64 if r20 ifir8ui r40 r80 r110 r110 0x2bc0 r50 = 0x80808080, r60 = 0xff ffffff ifir8ui r60 r50 r120 r120 0xfffe0200 0 15 31 r src1 0 15 31 r src2 0 31 r dest + 23 7 23 7 unsigned unsigned unsigned unsigned signed signed signed signed signed see also ifir8ii ufir8uu ifir16 ufir16 ifir8ui
pnx1300/01/02/11 data book philips semiconductors a-87 preliminary specification convert floating-point to integer using pcsw rounding mode syntax [ if r guard ] ifixieee r src1 r dest function if r guard then { r dest (long) ((float)r src1 ) } attributes function unit falu operation code 121 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifixieee operation converts the single-pre cision ieee floating-point value in r src1 to a signed integer and writes the result into r dest . rounding is according to the ieee rounding mode bits in pcsw. if r src1 is denormalized, zero is substituted before conversion, and the ifz flag in the pcsw is set. if ifixieee causes an ieee exception, such as overflow or underflow, the corresponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a si de-effect of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw except ion flags occurs at the same time as r dest is written. if any other floating-point compute op erations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with th e existing pcsw value fo r that exception flag. the ifixieeeflags operation computes the exception flags that would result from an individual ifixieee . the ifixieee operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does no t affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0) ifixieee r30 r100 r100 3 r35 = 0x40247ae1 (2.57) ifixieee r35 r102 r102 3, inx flag set r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ifixieee r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ifixieee r40 r110 r110 0x80000000 (-2 31 ), inv flag set r45 = 0x7f800000 (+inf)) ifixieee r45 r112 r112 0x7fffffff (2 31 -1), inv flag set r50 = 0xbfc147ae (-1.51) ifixieee r50 r115 r115 -2, inx flag set r60 = 0x00400000 (5.877471754e-39) ifixieee r60 r117 r117 0, ifz set r70 = 0xffffffff (qnan) ifixieee r70 r120 r120 0, inv flag set r80 = 0xffbfffff (snan) ifixieee r80 r122 r122 0, inv flag set see also ufixieee ifixrz ufixrz ifixieee
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-88 ieee status flags from c onvert floating-point to integer using pcsw rounding mode syntax [ if r guard ] ifixieeeflags r src1 r dest function if r guard then r dest ieee_flags((long) ((float)r src1 )) attributes function unit falu operation code 122 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifixieeeflags operation computes the ieee ex ceptions that would result from converting the single- precision ieee floating-point value in r src1 to a signed integer, and an integer bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. rounding is according to the ieee rounding mode bits in pcsw. if r src1 is denormalized, zero is substituted before computing the conversion, and the ifz bit in the result is set. the ifixieeeflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the de stination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0) ifixieeeflags r30 r100 r100 0 r35 = 0x40247ae1 (2.57) ifixieeeflags r35 r102 r102 0x02 (inx) r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ifixieeeflags r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ifixieeeflags r40 r110 r110 0x10 (inv) r45 = 0x7f800000 (+inf)) ifixieeeflags r45 r112 r112 0x10 (inv) r50 = 0xbfc147ae (-1.51) ifixieeeflags r50 r115 r115 0x02 (inx) r60 = 0x00400000 (5.877471754e-39) ifixieeeflags r60 r117 r117 0x20 (ifz) r70 = 0xffffffff (qnan) ifixieeeflags r70 r120 r120 0x10 (inv) r80 = 0xffbfffff (snan) ifixieeeflags r80 r122 r122 0x10 (inv) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ifixieee ufixieeeflags ifixrzflags ufixrzflags ifixieeeflags
pnx1300/01/02/11 data book philips semiconductors a-89 preliminary specification convert floating-point to integer with round toward zero syntax [ if r guard ] ifixrz r src1 r dest function if r guard then { r dest (long) ((float)r src1 ) } attributes function unit falu operation code 21 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifixrz operation converts the single-preci sion ieee floating-point value in r src1 to a signed integer and writes the result into r dest . rounding toward zero is per formed; the ieee rounding mode bits in pcsw are ignored. this is the preferred rounding for ansi c. if r src1 is denormalized, zero is substituted before conversion, and the ifz flag in the pcsw is set. if ifixrz causes an ieee exce ption, such as overflow or underflow, t he corresponding exception flags in the pcsw are set. the pcsw exception flag s are sticky: the flags can be set as a side-effect of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any ot her floating-point compute operations update the pcsw at the same time, the net result in each exception fl ag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the ifixrzflags operation computes the exception flags that would result from an individual ifixrz . the ifixrz operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does no t affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0) ifixrz r30 r100 r100 3 r35 = 0x40247ae1 (2.57) ifixrz r35 r102 r102 2, inx flag set r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ifixrz r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ifixrz r40 r110 r110 0x80000000 (-2 31 ), inv flag set r45 = 0x7f800000 (+inf)) ifixrz r45 r112 r112 0x7fffffff (2 31 -1), inv flag set r50 = 0xbfc147ae (-1.51) ifixrz r50 r115 r115 -1, inx flag set r60 = 0x00400000 (5.877471754e-39) ifixrz r60 r117 r117 0, ifz set r70 = 0xffffffff (qnan) ifixrz r70 r120 r120 0, inv flag set r80 = 0xffbfffff (snan) ifixrz r80 r122 r122 0, inv flag set see also ifixieee ufixieee ufixrz ifixrz
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-90 ieee status flags from c onvert floating-point to integer with round toward zero syntax [ if r guard ] ifixrzflags r src1 r dest function if r guard then r dest ieee_flags((long) ((float)r src1 )) attributes function unit falu operation code 129 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifixrzflags operation computes the ieee exceptions that would result from converting the single-precision ieee floating-point value in r src1 to a signed integer, and an integer bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the i eee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. rounding toward zero is performed; the ieee rounding mode bits in pcsw are ignored. if r src1 is denormalized, zero is s ubstituted before computing the conversion, and the ifz bit in the result is set. the ifixrzflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0) ifixrzflags r30 r100 r100 0 r35 = 0x40247ae1 (2.57) ifixrzflags r35 r102 r102 0x02 (inx) r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ifixrzflags r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ifixrzflags r40 r110 r110 0x10 (inv) r45 = 0x7f800000 (+inf)) ifixrzflags r45 r112 r112 0x10 (inv) r50 = 0xbfc147ae (-1.51) ifixrzflags r50 r115 r115 0x02 (inx) r60 = 0x00400000 (5.877471754e-39) ifixrzflags r60 r117 r117 0x20 (ifz) r70 = 0xffffffff (qnan) ifixrzflags r70 r120 r120 0x10 (inv) r80 = 0xffbfffff (snan) ifixrzflags r80 r122 r122 0x10 (inv) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ifixrz ufixrzflags ifixieeeflags ufixieeeflags ifixrzflags
pnx1300/01/02/11 data book philips semiconductors a-91 preliminary specification if non-zero negate syntax [ if r guard ] iflip r src1 r src2 r dest function if r guard then { if r src1 = 0 then r dest r src2 else r dest ?r src2 } attributes function unit dspalu operation code 77 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the iflip operation copies r src2 to r dest if r src1 = 0; otherwise (if r src1 != 0), r dest is set to the two?s-complement of r src2 . all values are signed integers. the iflip operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0, r40 = 1 iflip r30 r40 r50 r50 0x1 r10 = 0, r60 = 0xffff 0000, r70 = 0xabc if r10 iflip r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0xffff 0000, r70 = 0xabc if r20 iflip r60 r70 r90 r90 0xfffff544 r30 = 0, r100 = 0x ffffff9c iflip r30 r100 r110 r110 0xffffff9c r40 = 1, r110 = 0xffffffff iflip r40 r110 r120 r120 0x1 see also inonzero izero iflip
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-92 convert signed integer to floating-point syntax [ if r guard ] ifloat r src1 r dest function if r guard then { r dest (float) ((long)r src1 ) } attributes function unit falu operation code 20 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifloat operation converts the signed integer value in r src1 to single-precision ieee floating-point format and writes the result into r dest . rounding is according to the i eee rounding mode bits in pcsw. if ifloat causes an ieee exception, such as inexact, th e corresponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any fl oating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw except ion flags occurs at the same time as r dest is written. if any other floating-point compute op erations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the ifloatflags operation computes the exception flags that would result from an individual ifloat . the ifloat operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 3 ifloat r30 r100 r100 0x40400000 (3.0) r40 = 0xffffffff (-1) ifloat r40 r105 r105 0xbf800000 (-1.0) r10 = 0, r50 = 0xfffffffd if r10 ifloat r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ifloat r50 r115 r115 0xc0400000 (?3.0) r60 = 0x7fffffff ( 2147483647) ifloat r60 r117 r117 0x4f000000 (2.147483648e+9), inx flag set r70 = 0x80000000 (-2147483648) ifloat r70 r120 r120 0xcf000000 (-2.147483648e+9) r80 = 0x7ffffff1 ( 2147483633) ifloat r80 r122 r122 0x4f000000 (2.147483648e+9), inx flag set see also ufloat ifloatrz ufloatrz ifixieee ifloatflags ifloat
pnx1300/01/02/11 data book philips semiconductors a-93 preliminary specification ieee status flags from convert signed integer to floating-point syntax [ if r guard ] ifloatflags r src1 r dest function if r guard then r dest ieee_flags((float) ((long)r src1 )) attributes function unit falu operation code 130 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifloatflags operation computes the ieee exceptions that would result from converting the signed integer in r src1 to a single-precision ieee floating-point value, and an integer bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this oper ation. rounding is accordi ng to the ieee rounding mode bits in pcsw. the ifloatflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinat ion register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ifloatflags r30 r100 r100 0 r40 = 0xffffffff (-1) ifloatflags r40 r105 r105 0 r10 = 0, r50 = 0xfffffffd if r10 ifloatflags r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ifloatflags r50 r115 r115 0 r60 = 0x7fffffff ( 2147483647) ifloatflags r60 r117 r117 0x02 (inx) r70 = 0x80000000 (-2147483648) ifloatflags r70 r120 r120 0 r80 = 0x7ffffff1 ( 2147483633) ifloatflags r80 r122 r122 0x02 (inx) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ifloat ifloatrzflags ufloatflags ufloatrzflags ifloatflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-94 convert signed integer to floating-point with rounding toward zero syntax [ if r guard ] ifloatrz r src1 r dest function if r guard then { r dest (float) ((long)r src1 ) } attributes function unit falu operation code 117 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifloatrz operation converts the signed integer value in r src1 to single-precision ieee floating-point format and writes the result into r dest . rounding is performed toward zero; the ieee rounding mode bits in pcsw are ignored. this is the preferred rounding mode for ansi c. if ifloatrz causes an ieee exception, such as inexact, the corresponding exception flags in th e pcsw are set. the pcsw exception fl ags are sticky: the flags can be set as a side-effect of any floating-point operati on but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occu rs at the same time as r dest is written. if any other fl oating-point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the ifloatrzflags operation computes the exception flags that would result from an individual ifloatrz . the ifloatrz operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does not affect the exception flags in pcsw. examples initial values operation result r30 = 3 ifloatrz r30 r100 r100 0x40400000 (3.0) r40 = 0xffffffff (-1) ifloatrz r40 r105 r105 0xbf800000 (-1.0) r10 = 0, r50 = 0xfffffffd if r10 ifloatrz r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ifloatrz r50 r115 r115 0xc0400000 (?3.0) r60 = 0x7fffffff ( 2147483647) ifloatrz r60 r117 r117 0x4effffff (2.147483520e+9), inx flag set r70 = 0x80000000 (-2147483648) ifloatrz r70 r120 r120 0xcf000000 (-2.147483648e+9) r80 = 0x7ffffff1 ( 2147483633) ifloatrz r80 r122 r122 0x4effffff (2. 147483520e+9), inx flag set see also ifloat ufloatrz ifixieee ifloatflags ifloatrz
pnx1300/01/02/11 data book philips semiconductors a-95 preliminary specification ieee status flags from convert signed integer to floating-point with rounding toward zero syntax [ if r guard ] ifloatrzflags r src1 r dest function if r guard then r dest ieee_flags((float) ((long)r src1 )) attributes function unit falu operation code 118 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ifloatrzflags operation computes the ieee exceptions th at would result from converting the signed integer in r src1 to a single-precision ieee floati ng-point value, and an integer bit vector repr esenting the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. rounding is performed toward zero; the ieee rounding mode bits in pcsw are ignored. the ifloatrzflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the de stination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ifloatrzflags r30 r100 r100 0 r40 = 0xffffffff (-1) ifloatrzflags r40 r105 r105 0 r10 = 0, r50 = 0xfffffffd if r10 ifloatrzflags r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ifloatrzflags r50 r115 r115 0 r60 = 0x7fffffff ( 2147483647) ifloatrzflags r60 r117 r117 0x02 (inx) r70 = 0x80000000 (-2147483648) ifloatrzflags r70 r120 r120 0 r80 = 0x7ffffff1 ( 2147483633) ifloatrzflags r80 r122 r122 0x02 (inx) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ifloatrz ifloatflags ufloatflags ufloatrzflags ifloatrzflags
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-96 signed compare greater or equal syntax [ if r guard ] igeq r src1 r src2 r dest function if r guard then { if r src1 >= r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 14 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the igeq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than or equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the igeq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 igeq r30 r40 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 3 if r10 igeq r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 igeq r50 r60 r90 r90 1 r70 = 0x80000000, r40 = 4 igeq r70 r40 r100 r100 0 r70 = 0x80000000 igeq r70 r70 r110 r110 1 see also ileq igeqi igeq
pnx1300/01/02/11 data book philips semiconductors a-97 preliminary specification signed compare greater or equal with immediate syntax [ if r guard ] igeqi( n ) r src1 r dest function if r guard then { if r src1 >= n then r dest 1 else r dest 0 } attributes function unit alu operation code 1 number of operands 1 modifier 7 bits modifier range ?64..63 latency 1 issue slots 1, 2, 3, 4, 5 description the igeqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than or equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the igeqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 igeqi(2) r30 r80 r80 1 r30 = 3 igeqi(3) r30 r90 r90 1 r30 = 3 igeqi(4) r30 r100 r100 0 r10 = 0, r40 = 0x100 if r10 igeqi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 igeqi(63) r40 r100 r100 1 r60 = 0x80000000 igeqi(-64) r60 r120 r120 0 see also igeq iles ieqli igeqi
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary specification a-98 signed compare greater syntax [ if r guard ] igtr r src1 r src2 r dest function if r guard then { if r src1 > r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 15 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the igtr operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the igtr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 igtr r30 r40 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 3 if r10 igtr r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 igtr r50 r60 r90 r90 1 r70 = 0x80000000, r40 = 4 igtr r70 r40 r100 r100 0 r70 = 0x80000000 igtr r70 r70 r110 r110 0 see also iles igtri igtr
pnx1300/01/02/11 data book philips semiconductors a-99 preliminary specification signed compare greater with immediate syntax [ if r guard ] igtri( n ) r src1 r dest function if r guard then { if r src1 > n then r dest 1 else r dest 0 } attributes function unit alu operation code 0 number of operands 1 modifier 7 bits modifier range ?64..63 latency 1 issue slots 1, 2, 3, 4, 5 description the igtri operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the igtri operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 igtri(2) r30 r80 r80 1 r30 = 3 igtri(3) r30 r90 r90 0 r30 = 3 igtri(4) r30 r100 r100 0 r10 = 0, r40 = 0x100 if r10 igtri(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 igtri(63) r40 r100 r100 1 r60 = 0x80000000 igtri(-64) r60 r120 r120 0 see also igtr igeqi igtri
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-100 signed immediate syntax iimm( n ) r dest function r dest n attributes function unit const operation code 191 number of operands 0 modifier 32 bits modifier range 0x80000000 ..0x7fffffff latency 1 issue slots 1, 2, 3, 4, 5 description the iimm operation stores the signed 32-bit opcode modifier n into r dest . note: this operation is not guarded. examples initial values operation result iimm(2) r10 r10 2 iimm(0x100) r20 r20 0x100 iimm(0xfffc0000) r30 r30 0xfffc0000 see also uimm iimm
pnx1300/01/02/11 data book philips semiconductors a-101 preliminary specification interruptible indirect jump on false syntax [ if r guard ] ijmpf r src1 r src2 function if r guard then { if (r src1 & 1) = 0 then { dpc r src2 if exception is pending then service exception elseif interrupt is pending then service interrupts else pc, spc r src2 } } attributes function unit branch operation code 181 number of operands 2 modifier no modifier range ? delay 3 issue slots 2, 3, 4 description the ijmpf operation conditionally changes the program flow an d allows pending interrupts or exceptions to be serviced. if neither interrupts or exceptions are pending and the lsb of r src1 is 0, the dpc, pc, and spc registers are set equal to r src2 . if an interrupt or exception is pending and the lsb of r src1 is 0, dpc is set equal to r src2 and the service routine is invoked, wher e exceptions have priorities over interrupts. if the lsb of r src1 is 1, program execution continues with the next sequential instruction. the ijmpf operation optionally take s a guard, specified in r guard . if a guard is present, its lsb adds another condition to the jump. if the lsb of r guard is 1, the instruction executes as pr eviously described; otherwise, the jump will not be taken and pc, dpc, and spc are not modified regardless of the value of r src1 . examples initial values operation result r50 = 0, r70 = 0x330 ijmpf r50 r70 program execution continues at 0x330 after first servicing pending interrupts r20 = 1, r70 = 0x330 ijmpf r20 r70 since r20 is true, progr am execution contin- ues with next sequential instruction r30 = 0, r50 = 0, r60 = 0x8000 if r30 ijmpf r50 r60 since guard is false, program execution con- tinues with next sequential instruction r40 = 1, r50 = 0, r60 = 0x8000 if r40 ijmpf r50 r60 program execution continues at 0x8000 after first servicing pending interrupts see also jmpf jmpt jmpi ijmpt ijmpi ijmpf
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-102 interruptible jump immediate syntax [ if r guard ] ijmpi (address ) function if r guard then { dpc address if exception is pending then service exception else if interrupt is pending then service interrupts else pc, spc address } attributes function unit branch operation code 179 number of operands 0 modifier 32 bits modifier range 0..0x ffffffff delay 3 issue slots 2, 3, 4 description the ijmpi operation changes the program flow and allows pend ing interrupts or exceptions to be serviced. if no interrupts or exceptions are pending, the dpc, pc, and spc registers are set equal to address . if an exception or interrupts is pending, dpc is set equal to address and a service routine is invoked, where exceptions have priorities over interrupts. address is an immediate opcode modifier. the ijmpi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb adds a condition to the jump. if the lsb of r guard is 1, the instruction executes as previo usly described; otherwise, the jump will not be taken and pc, dpc, and spc are not modified. examples initial values operation result ijmpi(0x330) program execution continues at 0x330 r30 = 0 if r30 ijmpi(0x8000) since guard is false, program execution con- tinues with next sequential instruction r40 = 1 if r40 ijmpi(0x8000) program execution continues at 0x8000 see also jmpf jmpt jmpi ijmpf ijmpt ijmpi
pnx1300/01/02/11 data book philips semiconductors a-103 preliminary specification interruptible indir ect jump on true syntax [ if r guard ] ijmpt r src1 r src2 function if r guard then { if (r src1 & 1) = 1 then { dpc r src2 if exception is pending then service exception elseif interrupt is pending then service interrupts else pc, spc r src2 } } attributes function unit branch operation code 177 number of operands 2 modifier no modifier range ? delay 3 issue slots 2, 3, 4 description the ijmpt operation conditionally changes the program flow an d allows pending interrupts or exceptions to be serviced. if no interrupts or exceptions are pending and the lsb of r src1 is 1, the dpc, pc, a nd spc registers are set equal to r src2 . if an exception or interrupt is pending and the lsb of r src1 is 1, dpc is set equal to r src2 and a service routine is invoked, where exceptions have priority over interrupts. if the lsb of r src1 is 0, program execution continues with the next sequential instruction. the ijmpt operation optionally take s a guard, specified in r guard . if a guard is present, its lsb adds another condition to the jump. if the lsb of r guard is 1, the instruction executes as pr eviously described; otherwise, the jump will not be taken and pc, dpc, and spc are not modified regardless of the value of r src1 . examples initial values operation result r50 = 1, r70 = 0x330 ijmpt r50 r70 program execution continues at 0x330 after first servicing pending interrupts r20 = 0, r70 = 0x330 ijmpt r20 r70 since r20 is false, program execution contin- ues with next sequential instruction r30 = 0, r50 = 1, r60 = 0x8000 if r30 ijmpt r50 r60 since guard is false, program execution con- tinues with next sequential instruction r40 = 1, r50 = 1, r60 = 0x8000 if r40 ijmpt r50 r60 program execution continues at 0x8000 after first servicing pending interrupts see also jmpf jmpt jmpi ijmpf ijmpi ijmpt
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-104 signed 16-bit load pseudo-op for ild16d(0) syntax [ if r guard ] ild16 r src1 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[(r src1 +(1 bs)] temp<15:8> mem[(r src1 + (0 bs)] r dest sign_ext16to32(temp<15:0>) } attributes function unit dmem operation code 6 number of operands 1 modifier no modifier range ? latency 3 issue slots 4, 5 description the ild16 operation is a pseudo operation tr ansformed by the scheduler into an ild16d(0) with the same argument. (note: pseudo oper ations cannot be used in assembly source files.) the ild16 operation loads the 16-bit memory value from the address contained in r src1 , sign extends it to 32 bits, and stores the result in r dest . if the memory address contained in r src1 is not a multiple of 2, the result of ild16 is undefined but no exception will be raised. this load operation is performed as lit tle-endian or big- endian depending on the current setting of the bytesex bit in the pcsw. the result of an access by ild16 to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ild16 has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd00] = 0x22, [0xd01] = 0x11 ild16 r10 r60 r60 0x00002211 r30 = 0, r20 = 0xd04, [0xd04] = 0x84, [0xd05] = 0x33 if r30 ild16 r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd04] = 0x84, [0xd05] = 0x33 if r40 ild16 r20 r80 r80 0xffff 8433 r50 = 0xd01 ild16 r50 r90 r90 undefined, since 0xd01 is not a multiple of 2 see also ild16d ild16r ild16x ild16
pnx1300/01/02/11 data book philips semiconductors a-105 preliminary specification signed 16-bit load with displacement syntax [ if r guard ] ild16d( d ) r src1 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[(r src1 + d + (1 bs)] temp<15:8> mem[(r src1 + d + (0 bs)] r dest sign_ext16to32(temp<15:0>) } attributes function unit dmem operation code 6 number of operands 1 modifier 7 bits modifier range ?128..126 by 2 latency 3 issue slots 4, 5 description the ild16d operation loads the 16-bit memory va lue from the address computed by r src1 + d , sign extends it to 32 bits, and stores the result in r dest . the d value is an opcode modifier, must be in the range ?128 to 126 inclusive, and must be a multiple of 2. if the memory address computed by r src1 + d is not a multiple of 2, the result of ild16d is undefined but no exception will be raised. this load operation is performe d as little-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. the result of an access by ild16d to the mmio address aperture is und efined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild16d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addr essed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ild16d has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd02] = 0x22, [0xd03] = 0x11 ild16d(2) r10 r60 r60 0x00002211 r30 = 0, r20 = 0xd04, [0xd00] = 0x84, [0xd01] = 0x33 if r30 ild16d(-4) r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd00] = 0x84, [0xd01] = 0x33 if r40 ild16d(-4) r20 r80 r80 0xffff8433 r50 = 0xd01 ild16d(-4) r50 r90 r90 undefined, since 0xd01 +(?4) is not a multiple of 2 see also ild16 uld16 uld16d ild16r uld16r ild16x uld16x ild16d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-106 signed 16-bit load with index syntax [ if r guard ] ild16r r src1 r src2 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[(r src1 + r src2 +(1 bs)] temp<15:8> mem[(r src1 + r src2 + (0 bs)] r dest sign_ext16to32(temp<15:0>) } attributes function unit dmem operation code 195 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the ild16r operation loads the 16-bit memory va lue from the address computed by r src1 + r src2 , sign extends it to 32 bits, and stores the result in r dest . if the memory address computed by r src1 + r src2 is not a multiple of 2, the result of ild16r is undefined but no exception will be raised. this load operation is performed as little-endian or big- endian depending on the current setting of the bytesex bit in the pcsw. the result of an access by ild16r to the mmio address aperture is unde fined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild16r operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ild16r has no side effects whatever. examples initial values operation result r10 = 0xd00, r20 = 2, [0xd02] = 0x22, [0xd03] = 0x11 ild16r r10 r20 r80 r80 0x00002211 r50 = 0, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84, [0xd01] = 0x33 if r50 ild16r r40 r30 r90 no change, since guard is false r60 = 1, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84, [0xd01] = 0x33 if r60 ild16r r40 r30 r100 r100 0xffff8433 r70 = 0xd01, r30 = 0xfff ffffc ild16r r70 r30 r110 r110 undefined, since 0xd01 +(?4) is not a multiple of 2 see also ild16 uld16 ild16d uld16d uld16r ild16x uld16x ild16r
pnx1300/01/02/11 data book philips semiconductors a-107 preliminary specification signed 16-bit load with scaled index syntax [ if r guard ] ild16x r src1 r src2 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[(r src1 + (2 r src2 ) + (1 bs)] temp<15:8> mem[(r src1 + (2 r src2 ) + (0 bs)] r dest sign_ext16to32(temp<15:0>) } attributes function unit dmem operation code 196 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the ild16x operation loads the 16-bit memory value from the address computed by r src1 + 2 r src2 , sign extends it to 32 bits, and stores the result in r dest . if the memory address computed by r src1 + 2 r src2 is not a multiple of 2, the result of ild16x is undefined but no ex ception will be raised. this load operation is perf ormed as little-endian or big-endian depending on the current setting of the byte sex bit in the pcsw. the result of an access by ild16x to the mmio address aperture is und efined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild16x operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addr essed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ild16x has no side effects whatever. examples initial values operation result r10 = 0xd00, r30 = 1, [0xd02] = 0x22, [0xd03] = 0x11 ild16x r10 r30 r100 r100 0x00002211 r50 = 0, r40 = 0xd04, r20 = 0x fffffffe, [0xd00] = 0x84, [0xd01] = 0x33 if r50 ild16x r40 r20 r80 no change, since guard is false r60 = 1, r40 = 0xd04, r20 = 0x fffffffe, [0xd00] = 0x84, [0xd01] = 0x33 if r60 ild16x r40 r20 r90 r90 0xffff8433 r70 = 0xd01, r30 = 1 ild16x r70 r30 r110 r110 undefined, since 0xd01 + 2 1 is not a multiple of 2 see also ild16 uld16 ild16d uld16d ild16r uld16r uld16x ild16x
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-108 signed 8-bit load pseudo-op for ild8d(0) syntax [ if r guard ] ild8 r src1 r dest function if r guard then r dest sign_ext8to32(mem[r src1 ]) attributes function unit dmem operation code 192 number of operands 1 modifier no modifier range ? latency 3 issue slots 4, 5 description the ild8 operation is a pseudo operation transformed by the scheduler into an ild8d(0) with the same argument. (note: pseudo oper ations cannot be used in assembly source files.) the ild8 operation loads the 8-bit memory va lue from the address contained in r src1 , sign extends it to 32 bits, and stores the result in r dest . this operation does not depend on the byte sex bit in the pcsw since only a single byte is loaded. the result of an access by ild8 to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild8 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed location is cacheable. if the lsb of r guard is 0, r dest is not changed and ild8 has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd00] = 0x22 ild8 r10 r60 r60 0x00000022 r30 = 0, r20 = 0xd04, [0xd04] = 0x84 if r30 ild8 r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd04] = 0x84 if r40 ild8 r20 r80 r80 0xffffff84 r50 = 0xd01, [0xd01] = 0x33 ild8 r50 r90 r90 0x00000033 see also uld8 ild8d uld8d ild8r uld8r ild8
pnx1300/01/02/11 data book philips semiconductors a-109 preliminary specification signed 8-bit load with displacement syntax [ if r guard ] ild8d( d ) r src1 r dest function if r guard then r dest sign_ext8to32(mem[r src1 + d ]) attributes function unit dmem operation code 192 number of operands 1 modifier 7 bits modifier range ?64..63 latency 3 issue slots 4, 5 description the ild8d operation loads the 8-bit memory va lue from the address computed by r src1 + d , sign extends it to 32 bits, and stores the result in r dest . the d value is an opcode modifier in the ra nge -64 to 63, inclusive. this operation does not depend on the bytesex bit in the pc sw since only a single byte is loaded. the result of an access by ild8d to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild8d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed location is cacheable. if the lsb of r guard is 0, r dest is not changed and ild8d has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd02] = 0x22 ild8d(2) r10 r60 r60 0x000022 r30 = 0, r20 = 0xd04, [0xd00] = 0x84 if r30 ild8d(-4) r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd00] = 0x84 if r40 ild8d(-4) r20 r80 r80 0xffffff84 r50 = 0xd05, [0xd01] = 0x33 ild8d(-4) r50 r90 r90 0x00000033 see also ild8 uld8 uld8d ild8r uld8r ild8d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-110 signed 8-bit load with index syntax [ if r guard ] ild8r r src1 r src2 r dest function if r guard then r dest sign_ext8to32(mem[r src1 + r src2 ]) attributes function unit dmem operation code 193 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the ild8r operation loads the 8-bit memory value from the address computed by r src1 + r src2 , sign extends it to 32 bits, and stores the result in r dest . this operation does not depend on the bytesex bit in the pcsw since only a single byte is loaded. the result of an access by ild8r to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the ild8r operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed location is cacheable. if the lsb of r guard is 0, r dest is not changed and ild8r has no side effects whatever. examples initial values operation result r10 = 0xd00, r20 = 2, [0xd02] = 0x22 ild8r r10 r20 r80 r80 0x00000022 r50 = 0, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84 if r50 ild8r r40 r30 r90 no change, since guard is false r60 = 1, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84 if r60 ild8r r40 r30 r100 r100 0xffffff84 r70 = 0xd05, r30 = 0xfff ffffc, [0xd01] = 0x33 ild8r r70 r30 r110 r110 0x00000033 see also ild8 uld8 ild8d uld8d uld8r ild8r
pnx1300/01/02/11 data book philips semiconductors a-111 preliminary specification signed compare less or equal pseudo-op for igeq syntax [ if r guard ] ileq r src1 r src2 r dest function if r guard then { if r src1 <= r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 14 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ileq operation is a pseudo operation transformed by the scheduler into an igeq with the arguments exchanged ( ileq ?s r src1 is igeq ?s r src2 and vice versa). (note: pseudo ope rations cannot be used in assembly source files.) the ileq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than or equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the ileq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ileq r30 r40 r80 r80 1 r10 = 0, r60 = 0x100, r30 = 3 if r10 ileq r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, 0x100 if r20 ileq r50 r60 r90 r90 0 r70 = 0x80000000, r40 = 4 ileq r70 r40 r100 r100 1 r70 = 0x80000000 ileq r70 r70 r110 r110 1 see also igeq ileqi ileq
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-112 signed compare less or equal with immediate syntax [ if r guard ] ileqi( n ) r src1 r dest function if r guard then { if r src1 <= n then r dest 1 else r dest 0 } attributes function unit alu operation code 42 number of operands 1 modifier 7 bits modifier range ?64..63 latency 1 issue slots 1, 2, 3, 4, 5 description the ileqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than or equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the ileqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ileqi(2) r30 r80 r80 0 r30 = 3 ileqi(3) r30 r90 r90 1 r30 = 3 ileqi(4) r30 r100 r100 1 r10 = 0, r40 = 0x100 if r10 ileqi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ileqi(63) r40 r100 r100 0 r60 = 0x80000000 ileqi(-64) r60 r120 r120 1 see also ileq igeqi ileqi
pnx1300/01/02/11 data book philips semiconductors a-113 preliminary specification signed compare less pseudo-op for igtr syntax [ if r guard ] iles r src1 r src2 r dest function if r guard then { if r src1 < r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 15 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the iles operation is a pseudo operation transformed by the scheduler into an igtr with the arguments exchanged ( iles ?s r src1 is igtr ?s r src2 and vice versa). (note: pseudo ope rations cannot be used in assembly source files.) the iles operation sets the de stination register, r dest , to 1 if the first argument, r src1 , is less than the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the iles operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 iles r30 r40 r80 r80 1 r10 = 0, r60 = 0x100, r30 = 3 if r10 iles r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, 0x100 if r20 iles r50 r60 r90 r90 0 r70 = 0x80000000, r40 = 4 iles r70 r40 r100 r100 1 r70 = 0x80000000 iles r70 r70 r110 r110 0 see also igtr ilesi iles
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-114 signed compare less with immediate syntax [ if r guard ] ilesi( n ) r src1 r dest function if r guard then { if r src1 < n then r dest 1 else r dest 0 } attributes function unit alu operation code 2 number of operands 1 modifier 7 bits modifier range ?64..63 latency 1 issue slots 1, 2, 3, 4, 5 description the ilesi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the ilesi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ilesi(2) r30 r80 r80 0 r30 = 3 ilesi(3) r30 r90 r90 0 r30 = 3 ilesi(4) r30 r100 r100 1 r10 = 0, r40 = 0x100 if r10 ilesi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ilesi(63) r40 r100 r100 0 r60 = 0x80000000 ilesi(-64) r60 r120 r120 1 see also iles ileqi ilesi
pnx1300/01/02/11 data book philips semiconductors a-115 preliminary specification signed maximum syntax [ if r guard ] imax r src1 r src2 r dest function if r guard then { if r src1 > r src2 then r dest r src1 else r dest r src2 } attributes function unit dspalu operation code 24 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the imax operation sets the destination register, r dest , to the contents of r src1 if r src1 >r src2 ; otherwise, r dest is set to the contents of r src2 . the arguments are treated as signed integers. the imax operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 2, r20 = 1 imax r30 r20 r80 r80 2 r10 = 0, r60 = 0x100, r30 = 2 if r10 imax r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r40 = 0x ffffff9c if r20 imax r60 r40 r90 r90 0x100 r70 = 0xffffff00, r40 = 0xffffff9c imax r70 r40 r100 r100 0xffffff9c see also imin imax
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-116 signed minimum syntax [ if r guard ] imin r src1 r src2 r dest function if r guard then { if r src1 > r src2 then r dest r src2 else r dest r src1 } attributes function unit dspalu operation code 23 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the imin operation sets the destination register, r dest , to the contents of r src2 if r src1 >r src2 ; otherwise, r dest is set to the contents of r src1 . the arguments are treated as signed integers. the imin operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 2, r20 = 1 imin r30 r20 r80 r80 1 r10 = 0, r60 = 0x100, r30 = 2 if r10 imin r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r40 = 0x ffffff9c if r20 imin r60 r40 r90 r90 0xffffff9c r70 = 0xffffff00, r40 = 0xffffff9c imin r70 r40 r100 r100 0xffffff00 see also imax imin
pnx1300/01/02/11 data book philips semiconductors a-117 preliminary specification signed multiply syntax [ if r guard ] imul r src1 r src2 r dest function if r guard then temp (sign_ext32to64(r src1 ) sign_ext32to64(r src2 )) r dest temp<31:0> attributes function unit ifmul operation code 27 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the imul operation computes the product r src1 r src2 and writes the least-significant 32 bits of the full 64-bit product into r dest . the operands are considered signed intege rs. no overflow or underflow detection is performed. the imul operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0x100 imul r60 r60 r80 r80 0x10000 r10 = 0, r60 = 0x100, r30 = 0xf11 if r10 imul r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r30 = 0xf11 if r20 imul r60 r30 r90 r90 0xf1100 r70 = 0xffffff00, r40 = 0xffffff9c imul r70 r40 r100 r100 0x6400 0 31 r src1 0 31 r src2 0 31 r dest 0 63 31 64-bit result signed signed signed signed see also umul imulm umulm dspimul dspumul dspidualmul quadumulmsb fmul imul
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-118 signed multiply, return most-significant 32 bits syntax [ if r guard ] imulm r src1 r src2 r dest function if r guard then temp (sign_ext32to64(r src1 ) sign_ext32to64(r src2 )) r dest temp<63:32> attributes function unit ifmul operation code 139 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the imulm operation computes the product r src1 r src2 and writes the most-significant 32 bits of the full 64-bit product into r dest . the operands are considered signed integers. the imulm operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0x10000 imulm r60 r60 r80 r80 0x00000001 r10 = 0, r60 = 0x100, r30 = 0xf11 if r10 imulm r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x10001000, r30 = 0xf1100000 if r20 imulm r60 r30 r90 r90 0xff10ff11 r70 = 0xffffff00, r40 = 0x64 imulm r70 r40 r100 r100 0xffffffff 0 31 r src1 0 31 r src2 0 31 r dest 0 63 31 64-bit result signed signed signed signed see also umulm dspimul dspumul dspidualmul quadumulmsb fmul imulm
pnx1300/01/02/11 data book philips semiconductors a-119 preliminary specification signed negate pseudo-op for isub syntax [ if r guard ] ineg r src1 r dest function if r guard then r dest ?rsrc1 attributes function unit alu operation code 13 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ineg operation is a pseudo operation tr ansformed by the scheduler into an isub with r0 (always contains 0) as the first argument and r src1 as the second argument. (note: pseudo operations cannot be used in assembly source files.) the ineg operation computes the negative of r src1 and writes the result into r dest . the argument is a signed integer; the result is an unsigned integer. if rsrc1 = 0x80000000, then ineg returns 0x80000000 since the positive value is not representable. the ineg operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffffffff ineg r30 r60 r60 0x00000001 r10 = 0, r40 = 0xfffffff4 if r10 ineg r40 r80 no change, since guard is false r20 = 1, r40 = 0xfffffff4 if r20 ineg r40 r90 r90 0xc r50 = 0x80000001 ineg r50 r100 r100 0x7fffffff r60 = 0x80000000 ineg r60 r110 r110 0x80000000 r20 = 1 ineg r20 r120 r120 0xffffffff see also isub ineg
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-120 signed compare not equal syntax [ if r guard ] ineq r src1 r src2 r dest function if r guard then { if r src1 != r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 39 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ineq operation sets the destination register, r dest , to 1 if the two arguments, r src1 and r src2 , are not equal; otherwise, r dest is set to 0. the ineq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ineq r30 r40 r80 r80 1 r10 = 0, r60 = 0x1000, r30 = 3 if r10 ineq r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x1000 if r20 ineq r50 r60 r90 r90 0 r70 = 0x80000000, r40 = 4 ineq r70 r40 r100 r100 1 r70 = 0x80000000 ineq r70 r70 r110 r110 0 see also ieql igtr ineqi ineq
pnx1300/01/02/11 data book philips semiconductors a-121 preliminary specification signed compare not equal with immediate syntax [ if r guard ] ineqi( n ) r src1 r dest function if r guard then { if r src1 != n then r dest 1 else r dest 0 } attributes function unit alu operation code 3 number of operands 1 modifier 7 bits modifier range ?64..63 latency 1 issue slots 1, 2, 3, 4, 5 description the ineqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is not equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as signed integers. the ineqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ineqi(2) r30 r80 r80 1 r30 = 3 ineqi(3) r30 r90 r90 0 r30 = 3 ineqi(4) r30 r100 r100 1 r10 = 0, r40 = 0x100 if r10 ineqi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ineqi(63) r40 r100 r100 1 r60 = 0xffffffc0 ineqi(-64) r60 r120 r120 0 see also ineq igeqi ieqli ineqi
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-122 if nonzero select zero syntax [ if r guard ] inonzero r src1 r src2 r dest function if r guard then { if r src1 != 0 then r dest 0 else r dest r src2 } attributes function unit alu operation code 47 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the inonzero operation writes 0 into r dest if the value of r src1 is not zero; otherwise, r src2 is copied to r dest . the operands are considered signed integers. the inonzero operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 2, r20 = 1 inonzero r30 r20 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 2 if r10 inonzero r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r40 = 0x ffffff9c if r20 inonzero r60 r40 r90 r90 0 r10 = 0, r40 = 0xffffff9c inonzero r10 r40 r100 r100 0xffffff9c r20 = 1, r60 = 0x100 inonzero r20 r60 r110 r110 0 r10 = 0, r70 = 0x456789 inonzero r10 r70 r120 r120 0x456789 see also izero iflip inonzero
pnx1300/01/02/11 data book philips semiconductors a-123 preliminary specification subtract syntax [ if r guard ] isub r src1 r src2 r dest function if r guard then r dest r src1 ? r src2 attributes function unit alu operation code 13 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the isub operation computes the difference r src1 ?r src2 and writes the result into r dest . the operands can be either both signed or unsigned integers. no overflow or underflow detection is performed. the isub operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 isub r30 r40 r80 r80 0xffffffff r10 = 0, r60 = 0x100, r30 = 3 if r10 isub r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 isub r50 r60 r90 r90 0xf00 r70 = 0x80000000, r40 = 4 isub r70 r40 r100 r100 0x7ffffffc see also isubi borrow dspisub dspidualsub fsub isub
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-124 subtract with immediate syntax [ if r guard ] isubi( n ) r src1 r dest function if r guard then r dest r src1 ? n attributes function unit alu operation code 32 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the isubi operation computes the differen ce of a single argument in r src1 and an immediate modifier n and stores the result in r dest . the value of n must be between 0 and 127, inclusive. the isubi operations optionally take a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0xf11 isubi(127) r30 r70 r70 0xe92 r10 = 0, r40 = 0xffffff9c if r10 isubi(1) r40 r80 no change, since guard is false r20 = 1, r40 = 0xffffff9c if r20 isubi(1) r40 r90 r90 0xffffff9b r50 = 0x1000 isubi(15) r50 r120 r120 0x0ff1 r60 = 0xfffffff0 isubi(2) r60 r110 r110 0xffffffee r20 = 1 isubi(17) r20 r120 r120 0xfffffff0 see also isub borrow isubi
pnx1300/01/02/11 data book philips semiconductors a-125 preliminary specification if zero select zero syntax [ if r guard ] izero r src1 r src2 r dest function if r guard then { if r src1 = 0 then r dest 0 else r dest r src2 } attributes function unit alu operation code 46 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the izero operation writes 0 into r dest if the value of r src1 is equal to zero; otherwise, r src2 is copied to r dest . the operands are considered signed integers. the izero operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 2, r20 = 1 izero r30 r20 r80 r80 1 r10 = 0, r60 = 0x100, r30 = 2 if r10 izero r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r40 = 0x ffffff9c if r20 izero r60 r40 r90 r90 0xffffff9c r10 = 0, r40 = 0xffffff9c izero r10 r40 r100 r100 0 r20 = 1, r60 = 0x100 izero r20 r60 r110 r110 0x100 r20 = 1, r70 = 0x456789 izero r20 r70 r120 r120 0x456789 see also inonzero iflip izero
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-126 indirect jump on false syntax [ if r guard ] jmpf r src1 r src2 function if r guard then { if (r src1 & 1) = 0 then pc r src2 } attributes function unit branch operation code 180 number of operands 2 modifier no modifier range ? delay 3 issue slots 2, 3, 4 description the jmpf operation conditionally changes th e program flow. if the lsb of r src1 is 0, the pc register is set equal to r src2 ; otherwise, program exec ution continues with the nex t sequential instruction. the jmpf operation optionally takes a guard, specified in r guard . if a guard is present, its lsb adds another condition to the jump. if the lsb of r guard is 1, the instruction executes as pr eviously described; otherwise, the jump will not be taken regardless of the value of r src1 . examples initial values operation result r50 = 0, r70 = 0x330 jmpf r50 r70 program execution continues at 0x330 r20 = 1, r70 = 0x330 jmpf r20 r70 since r20 is true, progr am execution contin- ues with next sequential instruction r30 = 0, r50 = 0, r60 = 0x8000 if r30 jmpf r50 r60 since guard is false, program execution con- tinues with next sequential instruction r40 = 1, r50 = 0, r60 = 0x8000 if r40 jmpf r50 r60 program execution continues at 0x8000 see also jmpt jmpi ijmpf ijmpt ijmpi jmpf
pnx1300/01/02/11 data book philips semiconductors a-127 preliminary specification jump immediate syntax [ if r guard ] jmpi (address ) function if r guard then pc address attributes function unit branch operation code 178 number of operands 0 modifier 32 bits modifier range 0..0x ffffffff delay 3 issue slots 2, 3, 4 description the jmpi operation changes the program flow by setting the pc register equal to the immediate opcode modifier address . the jmpi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb adds a condition to the jump. if the lsb of r guard is 1, the instruction executes as previous ly described; otherwis e, the jump will not be taken. examples initial values operation result jmpi(0x330) program execution continues at 0x330 r30 = 0 if r30 jmpi(0x8000) since guard is false, program execution con- tinues with next sequential instruction r40 = 1 if r40 jmpi(0x8000) program execution continues at 0x8000 see also jmpf jmpt ijmpf ijmpt ijmpi jmpi
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-128 indirect jump on true syntax [ if r guard ] jmpt r src1 r src2 function if r guard then { if (r src1 & 1) = 1 then pc r src2 } attributes function unit branch operation code 176 number of operands 2 modifier no modifier range ? delay 3 issue slots 2, 3, 4 description the jmpt operation conditionally changes th e program flow. if the lsb of r src1 is 1, the pc register is set equal to r src2 ; otherwise, program exec ution continues with the nex t sequential instruction. the jmpt operation optionally takes a guard, specified in r guard . if a guard is present, its lsb adds another condition to the jump. if the lsb of r guard is 1, the instruction executes as pr eviously described; otherwise, the jump will not be taken regardless of the value of r src1 . examples initial values operation result r50 = 1, r70 = 0x330 jmpt r50 r70 program execution continues at 0x330 r20 = 0, r70 = 0x330 jmpt r20 r70 since r20 is false, program execution contin- ues with next sequential instruction r30 = 0, r50 = 1, r60 = 0x8000 if r30 jmpt r50 r60 since guard is false, program execution con- tinues with next sequential instruction r40 = 1, r50 = 1, r60 = 0x8000 if r40 jmpt r50 r60 program execution continues at 0x8000 see also jmpf jmpi ijmpf ijmpt ijmpi jmpt
pnx1300/01/02/11 data book philips semiconductors a-129 preliminary specification 32-bit load pseudo-op for ld32d(0) syntax [ if r guard ] ld32 r src1 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 r dest <7:0> mem[r src1 + (3 bs)] r dest <15:8> mem[r src1 + (2 bs)] r dest <23:16> mem[r src1 + (1 bs)] r dest <31:24> mem[r src1 + (0 bs)] } attributes function unit dmem operation code 7 number of operands 1 modifier no modifier range ? latency 3 issue slots 4, 5 description the ld32 operation is a pseudo operation transformed by the scheduler into an ld32d(0) with the same argument. (note: pseudo operations cannot be used in assemb ly source files.) the ld32 operation loads the 32-bit memory va lue from the address contained in r src1 and stores the result in r dest . if the memory address contained in r src1 is not a multiple of 4, the result of ld32 is undefined but no exception will be raised. this load operation is performed as little-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. the ld32 operation can be used to access the mmio address aper ture (the result of mmio access by 8- or 16-bit memory operations is undefined). the state of the b sx bit in the pcsw has no effect on mmio access by ld32 . the ld32 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addr essed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ld32 has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd00] = 0x84, [0xd01] = 0x33, [0xd02] = 0x22, [0xd03] = 0x11 ld32 r10 r60 r60 0x84332211 r30 = 0, r20 = 0xd04, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r30 ld32 r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r40 ld32 r20 r80 r80 0x48665544 r50 = 0xd01 ld32 r50 r90 r90 undefined, since 0xd01 is not a multiple of 4 see also ld32d ld32r ld32x st32 st32d h_st32d ld32
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-130 32-bit load with displacement syntax [ if r guard ] ld32d( d ) r src1 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 r dest <7:0> mem[r src1 + d + (3 bs)] r dest <15:8> mem[r src1 + d + (2 bs)] r dest <23:16> mem[r src1 + d + (1 bs)] r dest <31:24> mem[r src1 + d + (0 bs)] } attributes function unit dmem operation code 7 number of operands 1 modifier 7 bits modifier range ?256..252 by 4 latency 3 issue slots 4, 5 description the ld32d operation loads the 32-bit memory va lue from the address computed by r src1 + d and stores the result in r dest . the d value is an opcode modifier, must be in the range ?256 to 252 inclusive, and must be a multiple of 4. if the memory address computed by r src1 + d is not a multiple of 4, the result of ld32d is undefined but no exception will be raised. this load operation is perfo rmed as little-endian or bi g-endian depending on th e current setting of the bytesex bit in the pcsw. the ld32d operation can be used to access the mmio address ap erture (the result of mmio access by 8- or 16-bit memory operations is undefined). the state of the b sx bit in the pcsw has no effect on mmio access by ld32d . the ld32d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ld32d has no side effects whatever. examples initial values operation result r10 = 0xcfc, [0xd00] = 0x84, [0xd01] = 0x33, [0xd02] = 0x22, [0xd03] = 0x11 ld32d(4) r10 r60 r60 0x84332211 r30 = 0, r20 = 0xd0c, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r30 ld32d(-8) r20 r70 no change, since guard is false r40 = 1, r20 = 0xd0c, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r40 ld32d(-8) r20 r80 r80 0x48665544 r50 = 0xd01 ld32d(-8) r50 r90 r90 undefined, since 0x d01 +(?8) is not a multiple of 4 see also ld32 ld32r ld32x st32 st32d h_st32d ld32d
pnx1300/01/02/11 data book philips semiconductors a-131 preliminary specification 32-bit load with index syntax [ if r guard ] ld32r r src1 r src2 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 r dest <7:0> mem[r src1 + r src2 + (3 bs)] r dest <15:8> mem[r src1 + r src2 + (2 bs)] r dest <23:16> mem[r src1 + r src2 + (1 bs)] r dest <31:24> mem[r src1 + r src2 + (0 bs)] } attributes function unit dmem operation code 200 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the ld32r operation loads the 32-bit memory va lue from the address computed by r src1 + r src2 and stores the result in r dest . if the memory address computed by r src1 + r src2 is not a multiple of 4, the result of ld32r is undefined but no exceptio n will be raised. this load operation is performed as little- endian or big-e ndian depending on the current setting of the bytesex bit in the pcsw. the ld32r operation can be used to access the mmio address aperture (the result of mmio access by 8- or 16-bit memory operations is undefined). the state of the b sx bit in the pcsw has no effect on mmio access by ld32r . the ld32r operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addr essed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ld32r has no side effects whatever. examples initial values operation result r10 = 0xcfc, r20 = 0x4, [0xd00] = 0x84, [0xd01] = 0x33, [0xd02] = 0x22, [0xd03] = 0x11 ld32r r10 r20 r80 r80 0x84332211 r50 = 0, r40 = 0xd0c, r30 = 0xfffffff8, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r50 ld32r r40 r30 r90 no change, since guard is false r60 = 1, r40 = 0xd0c, r30 = 0xfffffff8, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r60 ld32r r40 r30 r100 r100 0x48665544 r50 = 0xd01, r30 = 0xfff ffff8 ld32r r70 r30 r110 r110 undefined, since 0xd01 +(?8) is not a multiple of 2 see also ld32 ld32d ld32x st32 st32d h_st32d ld32r
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-132 32-bit load with scaled index syntax [ if r guard ] ld32x r src1 r src2 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 r dest <7:0> mem[r src1 + (4 r src2 ) +(3 bs)] r dest <15:8> mem[r src1 + (4 r src2 ) + (2 bs)] r dest <23:16> mem[r src1 + (4 r src2 ) + (1 bs)] r dest <31:24> mem[r src1 + (4 r src2 ) + (0 bs)] } attributes function unit dmem operation code 201 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the ld32x operation loads the 32-bit memory value from the address computed by r src1 + 4 r src2 and stores the result in r dest . if the memory address computed by r src1 + 4 r src2 is not a multiple of 4, the result of ld32x is undefined but no exception will be raised. this load operation is performed as lit tle-endian or big- endian depending on the current setting of the bytesex bit in the pcsw. the ld32x operation can be used to access the mmio address ap erture (the result of mmio access by 8- or 16-bit memory operations is undefined). the state of the b sx bit in the pcsw has no effect on mmio access by ld32x . the ld32x operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and ld32x has no side effects whatever. examples initial values operation result r10 = 0xcfc, r30 = 0x1, [0xd00] = 0x84, [0xd01] = 0x33, [0xd02] = 0x22, [0xd03] = 0x11 ld32x r10 r30 r100 r100 0x84332211 r50 = 0, r40 = 0xd0c, r20 = 0xfffffffe, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r50 ld32x r40 r20 r80 no change, since guard is false r60 = 1, r40 = 0xd0c, r20 = 0xfffffffe, [0xd04] = 0x48, [0xd05] = 0x66, [0xd06] = 0x55, [0xd07] = 0x44 if r60 ld32x r40 r20 r90 r90 0x48665544 r70 = 0xd01, r30 = 0x1 ld32x r70 r30 r110 r110 undefined, since 0xd01 + 4 1 is not a multiple of 4 see also ld32 ld32d ld32r st32 st32d h_st32d ld32x
pnx1300/01/02/11 data book philips semiconductors a-133 preliminary specification logical shift left pseudo-op for asl syntax [ if r guard ] lsl r src1 r src2 r dest function if r guard then { n r src2 <4:0> r dest <31:n> r src1 <31?n:0> r dest 0 if rsrc2<31:5> != 0 { rdest <- 0 } } attributes function unit shifter operation code 19 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description the lsl operation is a pseudo operation that is transformed by the scheduler into an asl with the same arguments. (note: pseudo operations cann ot be used in assembly source files.) as shown below, the lsl operation takes two arguments, r src1 and r src2 . r src2 specify an unsigned shift amount, and rdest is set to r src1 logically shifted left by this amount. if the rs rc2<31:5> value is not zero, then take this as a shift by 32 or more bits. zeros are shifted into the lsbs of r dest while the msbs shifted out of r src1 are lost. the lsl operation optionally take s a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r60 = 0x20, r30 = 3 lsl r60 r30 r90 r90 0x100 r10 = 0, r60 = 0x20, r30 = 3 if r10 lsl r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x20, r30 = 3 if r20 lsl r60 r30 r110 r110 0x100 r70 = 0xfffffffc, r40 = 2 lsl r70 r40 r120 r120 0xfffffff0 r80 = 0xe, r50 = 0xfffffffe lsl r80 r50 r125 r125 0x00000000 (shift by more than 32)) r30 = 0x7008000f, r45 = 0x20 lsl r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x80000000 lsl r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x23 lsl r30 r45 r100 r100 0x00000000 0 31 r src1 0 31 r src2 0 0 0 left shifter 32 bits from r src1 0 31 r dest 3 0 0 0 intermediate result (example: n = 3) r src2 see also asl asli asr asri lsli lsr lsri rol roli lsl
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-134 logical shift left immediate pseudo-op for asli syntax [ if r guard ] lsli( n ) r src1 r dest function if r guard then { r dest <31: n > r src1 <31? n :0> r dest < n ?1:0> 0 } attributes function unit shifter operation code 11 number of operands 1 modifier 7 bits modifier range 0..31 latency 1 issue slots 1, 2 description the lsli operation is a pseudo operation that is transformed by the scheduler into an asli with the same argument and opcode modifier. (note: pseudo operat ions cannot be used in assembly source files.) as shown below, the lsli operation takes a single argument in r src1 and an immediate modifier n and produces a result in r dest equal to r src1 logically shifted left by n bits. the value of n must be between 0 and 31, inclusive. zeros are shifted into the lsbs of r dest while the msbs shifted out of r src1 are lost. the lsli operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r60 = 0x20 lsli(3) r60 r90 r90 0x100 r10 = 0, r60 = 0x20 if r10 lsli(3) r60 r100 no change, since guard is false r20 = 1, r60 = 0x20 if r20 lsli(3) r60 r110 r110 0x100 r70 = 0xfffffffc lsli(2) r70 r120 r120 0xfffffff0 r80 = 0xe lsli(30) r80 r125 r125 0x80000000 0 31 r src1 0 0 0 left shifter 32 bits from r src1 0 31 r dest 3 0 0 0 intermediate result (example: n = 3) shift amount n from operation modifier see also asl asli asr asri lsl lsr lsri rol roli lsli
pnx1300/01/02/11 data book philips semiconductors a-135 preliminary specification logical shift right syntax [ if r guard ] lsr r src1 r src2 r dest function if r guard then { n r src2 <4:0> r dest <31:32?n> 0 r dest <31?n:0> r src1 <31:n> if rsrc2<31:5> != 0 { rdest <- 0 } } attributes function unit shifter operation code 96 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the lsr operation takes two arguments, r src1 and r src2 . r src2 specifies an unsigned shift amount, and r src1 is logically shifted right by this amount. if the rsrc 2<31:5> value is no t zero, then take this as a shift by 32 or more bits. zeros f ill vacated bits from the left. the lsr operation optionally take s a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x7008000f, r20 = 1 lsr r30 r20 r50 r50 0x38040007 r30 = 0x7008000f, r42 = 2 lsr r30 r42 r60 r60 0x1c020003 r10 = 0, r30 = 0x7008000f, r44 = 4 if r10 lsr r30 r44 r70 no change, since guard is false r20 = 1, r30 = 0x7008000f, r44 = 4 if r20 lsr r30 r44 r80 r80 0x07008000 r40 = 0x80030007, r44 = 4 lsr r40 r44 r90 r90 0x08003000 r30 = 0x7008000f, r45 = 0x1f lsr r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x1f lsr r30 r45 r100 r100 0x00000001 r30 = 0x7008000f, r45 = 0x20 lsr r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x80000000 lsr r30 r45 r100 r100 0x00000000 r30 = 0x8008000f, r45 = 0x23 lsr r30 r45 r100 r100 0x00000000 0 31 r src1 0 31 r src2 0 0 0 right shifter 32 bits from r src1 0 31 r dest 28 0 0 0 intermediate result (example: n = 3) r src2 s s s see also asl asli asr asri lsl lsli lsri rol roli lsr
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-136 logical shift right immediate syntax [ if r guard ] lsri( n ) r src1 r dest function if r guard then { r dest <31:32? n > 0 r dest <31? n :0> r src1 <31: n > } attributes function unit shifter operation code 9 number of operands 1 modifier 7 bits modifier range 0..31 latency 1 issue slots 1, 2 description as shown below, the lsri operation takes a single argument in r src1 and an immediate modifier n and produces a result in r dest that is equal to r src1 logically shifted right by n bits. the value of n must be between 0 and 31, inclusive. zeros fill vacated bits from the left. the lsri operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x7008000f lsri(1) r30 r50 r50 0x38040007 r30 = 0x7008000f lsri(2) r30 r60 r60 0x1c020003 r10 = 0, r30 = 0x7008000f if r10 lsri(4) r30 r70 no change, since guard is false r20 = 1, r30 = 0x7008000f if r20 lsri(4) r30 r80 r80 0x07008000 r40 = 0x80030007 lsri(4) r40 r90 r90 0x08003000 r30 = 0x7008000f lsri(31) r30 r100 r100 0x00000000 r40 = 0x80030007 lsri(31) r40 r110 r110 0x00000001 0 0 0 right shifter 32 bits from r src1 0 31 r dest 28 0 0 0 intermediate result (example: n = 3) s s 0 31 r src1 shift amount n from operation modifier s see also asl asli asr asri lsl lsli lsr rol roli lsri
pnx1300/01/02/11 data book philips semiconductors a-137 preliminary specification mergedual16lsb merge dual 16-bit lsb bytes syntax [ if rguard ] mergedual16lsb rsrc1 rsrc2 rdest function if r guard then { rdest<31:24> <- rsrc1<23:16> rdest<23:16> <- rsrc1<7:0> rdest<15:8> <- rsrc2<23:16> rdest<7:0> <- rsrc2<7:0> } attributes function unit shifter operation code 103 number of operands 2 modifier no modifier range - latency 1 issue slots 1,2 description the arguments rsrc1 and rsrc2 are vectors of two 16-bit data. the mergedual16lsb operation merges the least significant bytes from each 16-bit data rsrc1 and rsrc2 into one 32-bit data in dest register, to convert to quad 8-bit. the mergedual16lsb operation optionally takes a guard, specified in rguard. if a guard is present, its lsb controls the modification of the destination register . if the lsb of rguard is 1, rdest is written; otherwise, rdest is not changed. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd mergedual16lsb r30 r40 -> r50 r50 <- 0x3478bbdd r10 = 0, r30 = 0x12345678, r40 = 0xaabbccdd if r10 mergedual16lsb r30 r40 -> r50 no change, since guard is false r10 = 1, r30 = 0x01020304, r40 = 0x0a0b0c0d if r10 mergedual16lsb r30 r40 -> r50 r50 <- 0x02040b0d 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest see also mergelsb mergemsb pack16lsb pack16msb
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-138 merge least-significant byte syntax [ if r guard ] mergelsb r src1 r src2 r dest function if r guard then { r dest <7:0> r src2 <7:0> r dest <15:8> r src1 <7:0> r dest <23:16> r src2 <15:8> r dest <31:24> r src1 <15:8> } attributes function unit alu operation code 57 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the mergelsb operation interleaves the two pairs of least-significant bytes from the arguments r src1 and r src2 into r dest . the least-significant byte from r src2 is packed into the least-significant byte of r dest ; the least-significant byte from r src1 is packed into the second-least-significant byte of r dest ; the second-least-significant byte from r src2 is packed into the second-most-significant byte of r dest ; and the second-least- significant byte from r src1 is packed into the most-significant byte of r dest . the mergelsb operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd mergelsb r30 r40 r50 r50 0x56cc78dd r10 = 0, r40 = 0x aabbccdd, r30 = 0x12345678 if r10 mergelsb r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0x aabbccdd, r30 = 0x12345678 if r20 mergelsb r40 r30 r70 r70 0xcc56dd78 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest see also pack16lsb pack16msb packbytes mergemsb mergelsb
pnx1300/01/02/11 data book philips semiconductors a-139 preliminary specification merge most-significant byte syntax [ if r guard ] mergemsb r src1 r src2 r dest function if r guard then { r dest <7:0> r src2 <23:15> r dest <15:8> r src1 <23:15> r dest <23:16> r src2 <31:24> r dest <31:24> r src1 <31:24> } attributes function unit alu operation code 58 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the mergemsb operation interleaves the two pairs of most-significant bytes from the arguments r src1 and r src2 into r dest . the second-most-significant byte from r src2 is packed into the least-significant byte of r dest ; the second-most-significant byte from r src1 is packed into the second-least-significant byte of r dest ; the most- significant byte from r src2 is packed into the second -most-significant byte of r dest ; and the most-significant byte from r src1 is packed into the most-significant byte of r dest . the mergemsb operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd mergemsb r30 r40 r50 r50 0x12aa34bb r10 = 0, r40 = 0xaabbccdd, r30 = 0x12345678 if r10 mergemsb r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0xaabbccdd, r30 = 0x12345678 if r20 mergemsb r40 r30 r70 r70 0xaa12bb34 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest see also pack16lsb pack16msb packbytes mergelsb mergemsb
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-140 no operation syntax nop function no operation attributes function unit - operation code - number of operands - modifier - modifier range - latency 1 issue slots 1-5 description the nop operation does not cha nge any dspcpu state. it is mainly used to fill-up the empty is sue slots. only two bits are used to code the nop operation. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd nop no change in any regsiters see also nop
pnx1300/01/02/11 data book philips semiconductors a-141 preliminary specification pack least-significant 16-bit halfwords syntax [ if r guard ] pack16lsb r src1 r src2 r dest function if r guard then { r dest <15:0> r src2 <15:0> r dest <31:16> r src1 <15:0> } attributes function unit alu operation code 53 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the pack16lsb operation packs the two least-signif icant halfwords from the arguments r src1 and r src2 into r dest . the halfword from r src1 is packed into the most-significant halfword of r dest ; the halfword from r src2 is packed into the least-significant halfword of r dest . the pack16lsb operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd pack16lsb r30 r40 r50 r50 0x5678ccdd r10 = 0, r40 = 0xaabbccdd, r30 = 0x12345678 if r10 pack16lsb r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0xaabbccdd, r30 = 0x12345678 if r20 pack16lsb r40 r30 r70 r70 0xccdd5678 0 15 31 r src1 0 15 31 r src2 0 15 31 r dest see also pack16msb packbytes mergelsb mergemsb pack16lsb
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-142 pack most-significant 16 bits syntax [ if r guard ] pack16msb r src1 r src2 r dest function if r guard then { r dest <15:0> r src2 <31:16> r dest <31:16> r src1 <31:16> } attributes function unit alu operation code 54 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the pack16msb operation packs the two most-signific ant halfwords from the arguments r src1 and r src2 into r dest . the halfword from r src1 is packed into the most-significant halfword of r dest ; the halfword from r src2 is packed into the least- significant halfword of r dest . the pack16msb operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd pack16msb r30 r40 r50 r50 0x1234aabb r10 = 0, r40 = 0x aabbccdd, r30 = 0x12345678 if r10 pack16msb r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0x aabbccdd, r30 = 0x12345678 if r20 pack16msb r40 r30 r70 r70 0xaabb1234 0 15 31 r src1 0 15 31 r src2 0 15 31 r dest see also pack16lsb packbytes mergelsb mergemsb pack16msb
pnx1300/01/02/11 data book philips semiconductors a-143 preliminary specification pack least-significant byte syntax [ if r guard ] packbytes r src1 r src2 r dest function if r guard then { r dest <7:0> r src2 <7:0> r dest <15:8> r src1 <7:0> } attributes function unit alu operation code 52 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the packbytes operation packs the two least-significant bytes from the arguments r src1 and r src2 into r dest . the byte from r src1 is packed into the second-least-significant byte of r dest ; the byte from r src2 is packed into the least-significant byte of r dest . the two most-signi ficant bytes of r dest are filled with zeros. the packbytes operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r30 = 0x12345678, r40 = 0xaabbccdd packbytes r30 r40 r50 r50 0x000078dd r10 = 0, r40 = 0xaabbccdd, r30 = 0x12345678 if r10 packbytes r40 r30 r60 no change, since guard is false r20 = 1, r40 = 0xaabbccdd, r30 = 0x12345678 if r20 packbytes r40 r30 r70 r70 0x0000dd78 0 7 15 23 31 r src1 0 7 15 23 31 r src2 0 7 15 23 31 r dest 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 see also pack16lsb pack16msb mergelsb mergemsb packbytes
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-144 prefetch pseudo-op for prefd(0) syntax [ if r guard ] pref r src1 function if r guard then { cache_block_mask = ~(cache_block_size - 1) data_cache <- mem[(rsrc1 + 0) & cache_block_mask] } attributes function unit dmemspec operation code 209 number of operands 1 modifier - modifier range - latency - issue slots 5 description the pref operation is a pseudo operation transformed by the scheduler into an prefd(0) with the same arguments. (note: pseudo operations cannot be used in assembly files.) the pref operation loads the one full cache block size of memory value from the address computed by ((rsrc1+0) & cache_block_mask) and stores the data into the data cach e. this operation is not g uaranteed to be executed. the prefetch unit will not ex ecute this operation when the data to be prefetched is alr eady in the data cache. a pref operation will not be exec uted when the cache is already occupied with 2 cache misses, when the operation is issued. the pref operation optionally takes a guard, specified in rguard. if a guard is present, its lsb controls the execution of the prefetch operation. if the lsb of rguard is 1, pref etch operation is executed; ot herwise, it is not executed. examples note: this operation may only be suppor ted in tm-1000, t m-1100, tm-1300 and pnx1300/01/02/11. it is not gu aranteed to be available in fu ture generations of trimedia products. initial values operation result r10 = 0xabcd, cache_block_size = 0x40 pref r10 loads a cache line for the address space from 0xabc0 to 0x0xabff from the main memory. if the data is already in the cache, the operation is not executed. r10 = 0xabcd, r11 = 0, cache_block_size = 0x40 if r11 pref r10 since guard is false, pref operation is not executed r10 = 0xabff, r11 = 1, cache_block_size = 0x40 if r11 pref r10 loads a cache line for the address space from 0xabc0 to 0x0xabff from the main memory. if the data is already in the cache, the operation is not executed. see also pref16x pref32x prefd prefr allocd allocr allocx pref
pnx1300/01/02/11 data book philips semiconductors a-145 preliminary specification pref16x prefetch with 16-bit scaled index syntax [ if r guard ] pref16x r src1 r src2 function if r guard then { cache_block_mask = ~(cache_block_size - 1) data_cache <- mem[(rsrc1 + (2 x rscr2)) & cache_block_mask] } attributes function unit dmemspec operation code 211 number of operands 2 modifier no modifier range - latency - issue slots 5 description the pref16x operation loads one full cache block from the main memory at the address computed by ((rsrc1+ (2 x rscr2)) & cache_block_mask) and stores the data into t he data cache. this operation is not guaranteed to be executed. the prefetch unit will not execute this operation when the data to be prefetched is al ready in the data cache. the data cache has hardware to simultaneously sustain tw o cache misses or prefetches . a pref16x operation will not be executed when the cache is already occupied wit h 2 cache misses, when the operation is issued. the pref16x operation optionally takes a guard, specified in rguard. if a guard is present, its lsb controls the execution of the prefetch operation. if the lsb of rguard is 1, prefetch operation is ex ecuted; otherwise, it is not executed examples note: this operation may only be suppor ted in tm-1000, t m-1100, tm-1300 and pnx1300/01/02/11. it is not gu aranteed to be available in future generations of trimedia products. initial values operation result r10 = 0xabcd, r12 = 0xc cache_block_size = 0x40 pref16x r10 r12 loads a cache line for the address space from 0xabc0 to 0xabff from the main memory. if the data is already in the cache, t he operation is not executed. r10 = 0xabcd, r11 = 0, r12=0xc, cache_block_size = 0x40 if r11 pref16x r10 r12 since guard is false, pref 16x operation is not exe- cuted r10 = 0xabff, r11 = 1, r12 =0x1, cache_block_size = 0x40 if r11 pref16x r10 r12 loads a cache line for the address space from 0xac00 to 0x0xac3f from the main memory. if the data is already in the cac he, the operati on is not exe- cuted. see also pref32x prefd prefr allocd allocr allocx
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-146 prefetch with 32-bit scaled index syntax [ if r guard ] pref32x r src1 r src2 function if r guard then { cache_block_mask = ~(cache_block_size - 1) data_cache <- mem[(rsrc1 + (4 x rscr2)) & cache_block_mask] } attributes function unit dmemspec operation code 212 number of operands 2 modifier no modifier range - latency - issue slots 5 description the pref32x operation loads the one full cache block size of memory value from the address computed by ((rsrc1+ (4 x rscr2)) & cache_block_mask) and stores the data into the data cache. this operation is not guaranteed to be executed. the prefetch unit will not execut e this operation when the data to be prefetched is already in the data cache. a pref32x operation will not be executed when the cache is already occupied wi th 2 cache misses, when the operation is issued. the pref32x operation optionally takes a guard, specified in rguard. if a guard is present, its lsb controls the execution of the prefetch operation. if the lsb of rguard is 1, prefetch operation is executed; otherwise, it is not executed.. examples note: this operation may only be suppor ted in tm-1000, tm-1100, tm-1300 and pnx1300/01/02/11. it is not gu aranteed to be available in fu ture generations of trimedia products. initial values operation result r10 = 0xabcd, r12 = 0xd cache_block_size = 0x40 pref32x r10 r12 loads a cache line for the address space from 0xac00 to 0x0xac3f from the main memory. if the data is already in the cac he, the operation is not exe- cuted. r10 = 0xabcd, r11 = 0, r12=0xd, cache_block_size = 0x40 if r11 pref32x r10 r12 since guard is false, pr ef32x operation is not exe- cuted r10 = 0xabff, r11 = 1, r12 =0x1, cache_block_size = 0x40 if r11 pref32x r10 r12 loads a cache line for the address space from 0xac00 to 0x0xac3f from the main memory. if the data is already in the cac he, the operation is not exe- cuted. see also pref16x prefd prefr allocd allocr allocx pref32x
pnx1300/01/02/11 data book philips semiconductors a-147 preliminary specification prefd prefetch with displacement syntax [ if r guard ] prefd(d) r src1 function if r guard then { cache_block_mask = ~(cache_block_size - 1) data_cache <- mem[(rsrc1 + d) & cache_block_mask] } attributes function unit dmemspec operation code 209 number of operands 1 modifier 7 bits modifier range ?256..252 by 4 latency - issue slots 5 description the prefd operation loads the one full cache block size of memory value from the address computed by ((rsrc1+d) & cache_block_mask) and stores the data into the data cach e. this operation is not gu aranteed to be executed. the prefetch unit will not execute this operat ion when the data to be prefetched is already in the da ta cache. a prefd operation will not be executed when the cache is already occupied with 2 cache misses, when the operation is issued. the prefd operation optionally takes a guar d, specified in rguard. if a guard is pr esent, its lsb controls the execution of the prefetch operation. if the lsb of rguard is 1, pref etch operation is executed; ot herwise, it is not executed.. examples note: this operation may only be suppor ted in tm-1000, t m-1100, tm-1300 and pnx1300/01/02/11. it is not gu aranteed to be available in future generations of trimedia products. initial values operation result r10 = 0xabcd, cache_block_size = 0x40 prefd(0xd) r10 loads a cache line for the address space from 0xabc0 to 0x0xabff from the main memory. if the data is already in the cache, the operation is not executed. r10 = 0xabcd, r11 = 0, cache_block_size = 0x40 if r11 prefd(0xd) r10 since guard is false, pref d operation is not executed r10 = 0xabff, r11 = 1, cache_block_size = 0x40 if r11 prefd(ox1) r10 loads a cache line for the address space from 0xac00 to 0x0xac3f from the main memory. if the data is already in the cac he, the operati on is not exe- cuted. see also pref16x pref32x prefr allocd allocr allocx
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-148 prefetch with index syntax [ if r guard ] prefr r src1 r src2 function i f r guard then { cache_block_mask = ~(cache_block_size - 1) data_cache <- mem[(rsrc1 + rscr2) & cache_block_mask] } attributes function unit dmemspec operation code 210 number of operands 2 modifier no modifier range - latency - issue slots 5 description the prefr operation loads the one full cache block si ze of memory value from the address computed by ((rsrc1+rscr2) & cache_block_mask) and stores the data into the data cache. this operation is not guaranteed to be executed. the prefetch unit will not execut e this operation when the data to be prefetched is already in the data cache. a prefr operation will not be executed when the cache is already occupied with 2 cache misses, when the operation is issued. the prefr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the execution of the prefetch operation. if the lsb of r guard is 1, prefetch operation is executed; otherwise, it is not executed.. examples note: this operation may only be suppor ted in tm-1000, tm-1100, tm-1300 and pnx1300/01/02/11. it is not gu aranteed to be available in fu ture generations of trimedia products. initial values operation result r10 = 0xabcd, r12 = 0xd cache_block_size = 0x40 prefr r10 r12 loads a cache line for the address space from 0xabc0 to 0x0xac3f from the main memory. if the data is already in the cac he, the operation is not exe- cuted. r10 = 0xabcd, r11 = 0, r12=0xd, cache_block_size = 0x40 if r11 prefr r10 r12 since guard is false, pref r operation is not executed r10 = 0xabff, r11 = 1, r12 =0x1, cache_block_size = 0x40 if r11 prefr r10 r12 loads a cache line for the address space from 0xac00 to 0x0xac3f from the main memory. if the data is already in the cac he, the operation is not exe- cuted. see also pref16x pref32x prefd allocd allocr allocx prefr
pnx1300/01/02/11 data book philips semiconductors a-149 preliminary specification unsigned byte-wise quad average syntax [ if r guard ] quadavg r src1 r src2 r dest function if r guard then { temp (zero_ext8to32(r src1 <7:0>) + zero_ext8to32(r src2 <7:0>) + 1) / 2 r dest <7:0> temp<7:0> temp (zero_ext8to32(r src1 <15:8>) + zero_ext8to32(r src2 <15:8>) + 1) / 2 r dest <15:8> temp<7:0> temp (zero_ext8to32(r src1 <23:16>) + zero_ext8to32(r src2 <23:16>) + 1) / 2 r dest <23:16> temp<7:0> temp (zero_ext8to32(r src1 <31:24>) + zero_ext8to32(r src2 <31:24>) + 1) / 2 r dest <31:24> temp<7:0> } attributes function unit dspalu operation code 73 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the quadavg operation computes four separate averages of the four pairs of corresponding 8-bit bytes of r src1 and r src2 . all bytes are considered unsigned. the least-si gnificant 8 bits of each average is written to the corresponding byte in r dest . no overflow or underflow detection is performed. the quadavg operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x0201000e, r40 = 0xf fffff02 quadavg r30 r40 r50 r50 0x81808008 r10 = 0, r60 = 0x9c9c6464, r70 = 0x649c649c if r10 quadavg r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x9c9c6464, r70 = 0x649c649c if r20 quadavg r60 r70 r90 r90 0x809c6480 0 15 31 r src1 0 15 31 r src2 0 31 r dest + + + + 23 7 23 7 1 1 1 1 7 15 23 0 8 0 8 0 8 0 8 four full-precision 9-bit sums unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned see also iavgonep dspuquadaddui ifir8ii quadavg
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-150 unsigned byte-wise quad maximum syntax [ if r guard ] quadumax r src1 r src2 r dest function if r guard then { r dest <7:0> if r src1 <7:0> > r src2 <7:0> then r src1< 7:0> else r src2 <7:0> r dest <15:8> if r src1 <15:8> > r src2 <15:8> then r src1< 15:8> else r src2 <15:8> r dest <23:16> if r src1 <23:16> > r src2 <23:16> then r src1< 23:16> else r src2 <23:16> r dest <31:24> if r src1 <31:24> > r src2 <31:24> then r src1< 31:24> else r src2 <31:24> } attributes function unit dspalu operation code 81 number of operands 2 modifier no modifier range ? latency 2 issue slots 1,3 description the quadumax operation computes four separate maximum values of the four pairs of corresponding 8-bit bytes of r src1 and r src2 . all bytes are considered unsigned. the quadumax operation is particularly suited to implement median computation on packed pixel data structures: median_q(a,b,c) (quadumin( quadumax( quadumin ((a),(b)), (c)), quadumax((a),(b)))) the quadumax operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x0201000e, r40 = 0xff00ff02 quadumax r30 r40 r50 r50 0xff01ff0e r10 = 0, r60 = 0x 9c9c6464, r70 = 0x649d649c if r10 quadumax r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x 9c9c6464, r70 = 0x649d649c if r20 quadumax r60 r70 r90 r90 0x9c9d649c see also imax imin quadumin quadumax
pnx1300/01/02/11 data book philips semiconductors a-151 preliminary specification quadumin unsigned bytewise quad minimum syntax [ if r guard ] quadumin r src1 r src2 r dest function if r guard then { r dest <7:0> if r src1 <7:0> < r src2 <7:0> then r src1< 7:0> else r src2 <7:0> r dest <15:8> if r src1 <15:8> < r src2 <15:8> then r src1< 15:8> else r src2 <15:8> r dest <23:16> if r src1 <23:16> < r src2 <23:16> then r src1< 23:16> else r src2 <23:16> r dest <31:24> if r src1 <31:24> < r src2 <31:24> then r src1< 31:24> else r src2 <31:24> } attributes function unit dspalu operation code 80 number of operands 2 modifier no modifier range ? latency 2 issue slots 1,3 description the quadumin operation computes four separate minimum values of the four pairs of corresponding 8-bit bytes of r src1 and r src2 . all bytes are considered unsigned. the quadumin operation is particularly suited to implement median computation on packed pixel data structures: median_q(a,b,c) (quadumin(quadumax( quadum in((a),(b)), (c)) , quadumax((a),(b)))) the quadumin operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x0201000e, r40 = 0xff00ff02 quadumin r30 r40 r50 r50 0x02000002 r10 = 0, r60 = 0x9c9c6464, r70 = 0x649d649c if r10 quadumin r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x9c9c6464, r70 = 0x649d649c if r20 quadumin r60 r70 r90 r90 0x649c6464 see also imin imax quadumax
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-152 unsigned quad 8-bit mult iply most significant syntax [ if r guard ] quadumulmsb r src1 r src2 r dest function if r guard then { temp (zero_ext8to32(r src1 <7:0>) zero_ext8to32(r src2 <7:0>)) r dest <7:0> temp<15:8> temp (zero_ext8to32(r src1 <15:8>) zero_ext8to32(r src2 <15:8>)) r dest <15:8> temp<15:8> temp (zero_ext8to32(r src1 <23:16>) zero_ext8to32(r src2 <23:16>)) r dest <23:16> temp<15:8> temp (zero_ext8to32(r src1 <31:24>) zero_ext8to32(r src2 <31:24>)) r dest <31:24> temp<15:8> } attributes function unit dspmul operation code 89 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the quadumulmsb operation computes four separate prod ucts of the four pairs of corresponding 8-bit bytes of r src1 and r src2 . all bytes are considered unsigned. the most-s ignificant 8 bits of each 16-bit product is written to the corresponding byte in r dest . the quadumulmsb operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x0210800e, r40 = 0xff ffff02 quadumulmsb r30 r40 r50 r50 0x010f7f00 r10 = 0, r60 = 0x80ff1010, r70 = 0x80ff100f if r10 quadumulmsb r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x80ff1010, r70 = 0x80ff100f if r20 quadumulmsb r60 r70 r90 r90 0x40fe0100 0 15 31 r src1 0 15 31 r src2 0 31 r dest 23 7 23 7 7 15 23 7 15 four full-precision 16-bit products 0 7 15 0 7 15 0 7 15 0 unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned see also quadavg dspuquadaddui ifir8ii quadumulmsb
pnx1300/01/02/11 data book philips semiconductors a-153 preliminary specification read data cache status bits syntax [ if r guard ] rdstatus( d ) r src1 r dest function if r guard then { set_addr r src1 + d /* set_addr<10:6> selects set */ r dest <9:0> dcache_lru_set(set_addr) r dest <17:10> dcache_dirty_set(set_addr) r dest <31:18> 0 } attributes function unit dmemspec operation code 203 number of operands 1 modifier 7 bits modifier range ?256..252 by 4 latency 3 issue slots 5 description the rdstatus operation reads the lru and dirty bits associated with a set in the data cache and writes these bits into the destinat ion register r dest . the target set in the data cache is determined by bits 10..6 of the result of r src1 + d . the d value is an opcode modifier, must be in the range ?256 to 252 inclusive, and must be a multiple of 4. the result of rdstatus contains lru information in bits 9..0 and dirty- bit information in bits 17..10. all other bits of r dest are set to zero. rdstatus requires two stall cycles to complete. the dual-ported data cache uses two separate copies of ta g and status information. a rdstatus operation returns the lru and dirty information stored in the cache port that corresponds to the operation slot in which the rdstatus operation is issued. the rdstatus operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result rdstatus(0) r30 r60 r10 = 0 if r10 rdstatus(4) r40 r70 no change, since guard is false r20 = 1 if r20 rdstatus(8) r50 r80 see also rdtag rdstatus
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-154 read data cache address tag syntax [ if r guard ] rdtag( d ) r src1 r dest function if r guard then { block_addr r src1 + d /* block_addr<13:11> selects element , block_addr<10:6> selects set */ r dest <21:0> dcache_tag_block(block_addr) r dest <31:22> 0 } attributes function unit dmemspec operation code 202 number of operands 1 modifier 7 bits modifier range ?256..252 by 4 latency 3 issue slots 5 description the rdtag operation reads the address tag asso ciated with a block in the data cac he and writes these bits into the destination register r dest . the target block in the data cache is deter mined by bits 13..6 of the result of r src1 + d . bits 10..6 of r src1 + d select the cache set and 13..11 of r src1 + d select the element within that set. the d value is an opcode modifier, must be in the range ?256 to 252 inclusive, and must be a multiple of 4. rdtag writes the address tag for the selected block in bits 21..0 of r dest . all other bits of r dest are set to zero. rdtag requires no stall cycles to complete. the dual-ported data cache uses two separate copies of tag and status information. a rdtag operation returns the address tag information stored in the cache port that corresponds to the operation slot in which the rdtag operation is issued. the rdtag operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result rdtag(0) r30 r60 r10 = 0 if r10 rdtag(4) r40 r70 no change, since guard is false r20 = 1 if r20 rdtag(8) r50 r80 see also rdstatus rdtag
pnx1300/01/02/11 data book philips semiconductors a-155 preliminary specification read destination program counter syntax [ if r guard ] readdpc r dest function if r guard then { r dest dpc } attributes function unit fcomp operation code 156 number of operands 0 modifier no modifier range ? latency 1 issue slots 3 description the readdpc writes the current value of the dpc (destina tion program counter) processor register to r dest . interruptible jumps write their target address to the dpc. if an interrupt or exception is taken at an interruptible jump, execution of the interrupted program can be resumed by ju mping to the value contained in dpc. this operation can be used to save state before idling a task in a multi-tasking environment. the readdpc operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result dpc = 0xbeebee readdpc r100 r100 0xbeebee r20 = 0, dpc = 0xabba if r20 readdpc r101 no change, since guard is false r21 = 1, dpc = 0xabba if r21 readdpc r102 r102 0xabba see also writedpc readspc ijmpf ijmpi ijmpt readdpc
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-156 read program control and status word syntax [ if r guard ] readpcsw r dest function if r guard then { r dest pcsw } attributes function unit fcomp operation code 158 number of operands 0 modifier no modifier range ? latency 1 issue slots 3 description the readpcsw writes the current value of the pcsw (program control and status word ) processor register to r dest . the layout of pcsw is shown below. fields in the pcsw have two chief pur poses: to control aspects of processor operation and to record events that occur during program execution. thus, readpcsw can be used to determine current processor operating modes and what events have occurred; this operation can also be used to save state before idlin g a task in a multi-tasking environment. the readpcsw operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result pcsw = 0x80110642 readpcsw r100 r100 0x80110642 (trap on mse, inv and dbz enabled, ien=1 - interrupt s enabled, bsx=1 - little endian mode of operation, ofz=1 - a denormalized result was produced somewhere, inx=1 - an inexact result was produced somewhere) r20 = 0, pcsw = 0x80000000 if r20 readpcsw r101 no change, since guard is false r21 = 1, pcsw = 0x80000000 if r21 readpcsw r102 r102 0x80000000 (trap on mse enabled) mse cs ien bsx ieee mode ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 misaligned store exception count stalls (1 ? yes) fp exception trap-enable bits ieee rounding mode 0 ? to nearest, 1 ? to zero, 2 ? to positive, 3 ? to negative interrupt enable (1 ? allow interrupts) byte sex (1 ? little endian) pcsw<31:16> pcsw<15:0> undef misaligned store exception trap enable trap on first exit fp exceptions trp mse tfe trp ofz trp ifz trp inv trp ovf trp unf trp inx trp dbz 16 17 18 19 20 21 22 23 25 26 27 28 30 31 undef undefined 13 wbe rse write back error reserved exception trp wbe trp rse write back error trap enable reserved exception trap enable 29 see also writepcsw readpcsw
pnx1300/01/02/11 data book philips semiconductors a-157 preliminary specification read source program counter syntax [ if r guard ] readspc r dest function if r guard then { r dest spc } attributes function unit fcomp operation code 157 number of operands 0 modifier no modifier range ? latency 1 issue slots 3 description the readspc writes the current value of the spc (sourc e program counter) processor register to r dest . an interruptible jump that is not interrupted (no nmi, in t, or exc event was pending when the jump was executed) writes its target address to spc. the value of spc allo ws an exception-handling routine to determine the start address of the block of scheduled c ode (called a decision tree) that was executing before the exception was taken.this operation can be used to save state before idling a task in a multi-tasking environment. the readspc operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result spc = 0xbeebee readspc r100 r100 0xbeebee r20 = 0, spc = 0xabba if r20 readspc r101 no change, since guard is false r21 = 1, spc = 0xabba if r21 readspc r102 r102 0xabba see also writespc readdpc ijmpf ijmpi ijmpt readspc
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-158 rotate left syntax [ if r guard ] rol r src1 r src2 r dest function if r guard then { n r src2 <4:0> r dest <31:n> r src1 <31?n:0> r dest r src1 <31:32?n> } attributes function unit shifter operation code 97 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2 description as shown below, the rol operation takes two arguments, r src1 and r src2 . the least-significant five bits of r src2 specify an unsigned rotate amount, and r dest is set to r src1 rotated left by this amount. the most-significant n bits of r src1 , where n is the rotate amount, appear as the least-significant n bits in r dest . the rol operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r60 = 0x20, r30 = 3 rol r60 r30 r90 r90 0x100 r10 = 0, r60 = 0x20, r30 = 3 if r10 rol r60 r30 r100 no change, since guard is false r20 = 1, r60 = 0x20, r30 = 3 if r20 rol r60 r30 r110 r110 0x100 r70 = 0xfffffffc, r40 = 2 rol r70 r40 r120 r120 0xfffffff3 r80 = 0xe, r50 = 0xfffffffe rol r80 r50 r125 r125 0x80000003 (r50 is effectively equal to 0x1e) 0 31 r src1 0 31 r src2 4 n left rotator 32 bits from r src1 0 31 r dest 9 intermediate result (example: n = 9) five lsbs of r src2 0 31 32 bits from r src1 0 31 23 23 see also roli asr asri lsl lsli lsr lsri rol
pnx1300/01/02/11 data book philips semiconductors a-159 preliminary specification rotate left by immediate syntax [ if r guard ] roli( n ) r src1 r dest function if r guard then { r dest <31: n > r src1 <31? n :0> r dest < n ?1:0> r src1 <31:32? n > } attributes function unit shifter operation code 98 number of operands 1 modifier 7 bits modifier range 0..31 latency 1 issue slots 1, 2 description as shown below, the roli operation takes a single argument in r src1 and an immediate modifier n and produces a result in r dest equal to r src1 rotated left by n bits. the value of n must be between 0 and 31, inclusive. the most- significant n bits of r src1 appear as the least-significant n bits in r dest . the roli operations optionally take a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is unchanged. examples initial values operation result r60 = 0x20 roli(3) r60 r90 r90 0x100 r10 = 0, r60 = 0x20 if r10 roli(3) r60 r100 no change, since guard is false r20 = 1, r60 = 0x20 if r20 roli(3) r60 r110 r110 0x100 r70 = 0xfffffffc roli(2) r70 r120 r120 0xfffffff3 r80 = 0xe roli(30) r80 r125 r125 0x80000003 rotate amount n from operation modifier 0 31 r src1 left rotator 32 bits from r src1 0 31 r dest 9 intermediate result (example: n = 9) 0 31 32 bits from r src1 0 31 23 23 see also rol asl asli asr asri lsl lsli lsr lsri roli
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-160 sign extend 16 bits syntax [ if r guard ] sex16 r src1 r dest function if r guard then r dest sign_ext16to32(r src1 <15:0>) attributes function unit alu operation code 51 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the sex16 operation sign extends the least-significant 16bit halfword of the argument, r src1 , to 32 bits and stores the result in r dest . the sex16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the desti nation register. if the lsb of the guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffff0040 sex16 r30 r60 r60 0x00000040 r10 = 0, r40 = 0xff0fff91 if r10 sex16 r40 r70 no change, since guard is false r20 = 1, r40 = 0xff0fff91 if r20 sex16 r40 r100 r100 0xffffff91 r50 = 0x00000091 sex16 r50 r110 r110 0x00000091 0 15 31 r src1 0 31 r dest 15 s s s s s s s s s s s s s s s s s s signed signed see also zex16 sex8 zex8 sex16
pnx1300/01/02/11 data book philips semiconductors a-161 preliminary specification sign extend 8 bits pseudo-op for ibytesel syntax [ if r guard ] sex8 r src1 r dest function if r guard then r dest sign_ext8to32(r src1 <7:0>) attributes function unit alu operation code 56 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the sex8 operation is a pseudo operation transformed by the scheduler into a ibytesel with r src1 as the first argument and r0 (always contains 0) as the second. (not e: pseudo operations cannot be used in assembly source files.) as shown below, the sex8 operation sign extends th e least-significant halfword of the argument, r src1 , to 32 bits and writes the result in r dest . the sex8 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffff0040 sex8 r30 r60 r60 0x00000040 r10 = 0, r40 = 0xff0fff91 if r10 sex8 r40 r70 no change, since guard is false r20 = 1, r40 = 0xff0fff91 if r20 sex8 r40 r100 r100 0xffffff91 r50 = 0x00000091 sex8 r50 r110 r110 0xffffff91 0 15 31 r src1 0 31 r dest 15 7 7 23 23 s s s s s s s s s s s s s s s s s s s s s s s s s s signed signed see also ibytesel sex16 zex8 zex16 sex8
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-162 16-bit store pseudo-op for h_st16d(0) syntax [ if r guard ] st16 r src1 r src2 function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 mem[r src1 + (1 bs)] r src2 <7:0> mem[r src1 + (0 bs)] r src2 <15:8> } attributes function unit dmem operation code 30 number of operands 2 modifier no modifier range ? latency n/a issue slots 4, 5 description the st16 operation is a pseudo operation transformed by the scheduler into an h_st16d(0) with the same arguments. (note: pseudo operations c annot be used in assembly files.) the st16 operation stores the least-si gnificant 16-bit halfword of r src2 into the memory locati ons pointed to by the address in r src1 . this store operation is performed as little-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. if st16 is misaligned (the memory address in r src1 is not a multiple of 2), the result of st16 is undefined, and the mse (misaligned store exception) bit in the pcsw regist er is set to 1. additionally, if the trpmse (trap on misaligned store excepti on) bit in pcsw is 1, exception processing will be requested on the next interruptible jump. the result of an access by st16 to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the st16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory locations (and the modi fication of cache if the locations are cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, st16 has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xd00, r80 = 0x44332211 st16 r10 r80 [0xd00] 0x22, [0xd01] 0x11 r50 = 0, r20 = 0xd01, r70 = 0xaabbccdd if r50 st16 r20 r70 no change, since guard is false r60 = 1, r30 = 0xd02, r70 = 0xaabbccdd if r60 st16 r30 r70 [0xd02] 0xcc, [0xd03] 0xdd see also st16d h_st16d st8 st8d st32 st32d st16
pnx1300/01/02/11 data book philips semiconductors a-163 preliminary specification 16-bit store with displacement pseudo-op for h_st16d syntax [ if r guard ] st16d( d ) r src1 r src2 function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 mem[r src1 + d + (1 bs)] r src2 <7:0> mem[r src1 + d + (0 bs)] r src2 <15:8> } attributes function unit dmem operation code 30 number of operands 2 modifier 7 bits modifier range ?128..126 by 2 latency n/a issue slots 4, 5 description the st16d operation is a pseudo operation tran sformed by the scheduler into an h_st16d with the same arguments. (note: pseudo operations cannot be used in assembly files.) the st16d operation stores the least-signi ficant 16-bit halfword of r src2 into the memory locations pointed to by the address in r src1 + d . the d value is an opcode modifier, must be in th e range ?128 and 126 inclusive, and must be a multiple of 2. this store operation is performed as little- endian or big-endian depending on the current setting of the bytesex bit in the pcsw. if st16d is misaligned (the memory address computed by r src1 + d is not a multiple of 2), the result of st16d is undefined, and the mse (misaligned store exception) bit in th e pcsw register is set to 1. additionally, if the trpmse (trap on misaligned store exception) bit in pcsw is 1, exception processing will be requested on the next interruptible jump. the result of an access by st16d to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the st16d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory lo cations (and the modification of cache if the locations are cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, st16d has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xcfe, r80 = 0x44332211 st16d(2) r10 r80 [0xd00] 0x22, [0xd01] 0x11 r50 = 0, r20 = 0xd05, r70 = 0xaabbccdd if r50 st16d(?4) r20 r70 no change, since guard is false r60 = 1, r30 = 0xd06, r70 = 0xaabbccdd if r60 st16d(?4) r30 r70 [0xd02] 0xcc, [0xd03] 0xdd see also st16 h_st16d st8 st8d st32 st32d st16d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-164 32-bit store pseudo-op for h_st32d(0) syntax [ if r guard ] st32 r src1 r src2 function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 mem[r src1 + (3 bs)] r src2 <7:0> mem[r src1 + (2 bs)] r src2 <15:8> mem[r src1 + (1 bs)] r src2 <23:16> mem[r src1 + (0 bs)] r src2 <31:24> } attributes function unit dmem operation code 31 number of operands 2 modifier no modifier range ? latency n/a issue slots 4, 5 description the st32 operation is a pseudo operation transformed by the scheduler into an h_st32d(0) with the same arguments. (note: pseudo operations c annot be used in assembly files.) the st32 operation stores all 32 bits of r src2 into the memory lo cations pointed to by the address in r src1 . the d value is an opcode modifier and must be a multiple of 4. this store operation is perfo rmed as little-endian or big- endian depending on the current setting of the bytesex bit in the pcsw. if st32 is misaligned (the memory address in r src1 is not a multiple of 4), the result of st32 is undefined, and the mse (misaligned store exception) bit in the pcsw regist er is set to 1. additionally, if the trpmse (trap on misaligned store excepti on) bit in pcsw is 1, exception processing will be requested on the next interruptible jump. the st32 operation can be used to access the mmio address aper ture (the result of mmio access by 8- or 16-bit memory operations is undefined). the state of the b sx bit in the pcsw has no effect on mmio access by st32 . the st32 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory locations (and the modi fication of cache if the locations are cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, st32 has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xd00, r80 = 0x44332211 st32 r10 r80 [0xd00] 0x44, [0xd01] 0x33, [0xd02] 0x22, [0xd03] 0x11 r50 = 0, r20 = 0xd01, r70 = 0xaabbccdd if r50 st32 r20 r70 no change, since guard is false r60 = 1, r30 = 0xd04, r70 = 0xaabbccdd if r60 st32 r30 r70 [0xd04] 0xaa, [0xd05] 0xbb, [0xd06] 0xcc, [0xd07] 0xdd see also h_st32d st32d st16 st16d st8 st8d st32
pnx1300/01/02/11 data book philips semiconductors a-165 preliminary specification 32-bit store with displacement pseudo-op for h_st32d syntax [ if r guard ] st32d( d ) r src1 r src2 function if r guard then { if pcsw.bytesex = little_endian then bs 3 else bs 0 mem[r src1 + d + (3 bs)] r src2 <7:0> mem[r src1 + d + (2 bs)] r src2 <15:8> mem[r src1 + d + (1 bs)] r src2 <23:16> mem[r src1 + d + (0 bs)] r src2 <31:24> } attributes function unit dmem operation code 31 number of operands 2 modifier 7 bits modifier range ?256..252 by 4 latency n/a issue slots 4, 5 description the st32d operation is a pseudo operation tran sformed by the scheduler into an h_st32d with the same arguments. (note: pseudo operations cannot be used in assembly files.) the st32d operation stores all 32 bits of r src2 into the memory lo cations pointed to by the address in r src1 + d . the d value is an opcode modifier, must be in the range ?256 and 252 inclusive, and must be a multiple of 4. this store operation is performed as little-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. if st32d is misaligned (the memory address computed by r src1 + d is not a multiple of 4), the result of st32d is undefined, and the mse (misaligned store exception) bit in th e pcsw register is set to 1. additionally, if the trpmse (trap on misaligned store exception) bit in pcsw is 1, exception processing will be requested on the next interruptible jump. the st32d operation can be used to access the mmio address aperture (the result of mmio access by 8- or 16-bit memory operations is undefined). the state of the b sx bit in the pcsw has no effect on mmio access by st32d . the st32d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory lo cations (and the modification of cache if the locations are cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, st32d has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xcfc, r80 = 0x44332211 st32d(4) r10 r80 [0xd00] 0x44, [0xd01] 0x33, [0xd02] 0x22, [0xd03] 0x11 r50 = 0, r20 = 0xd0b, r70 = 0xaabbccdd if r50 st32d(?8) r20 r70 no change, since guard is false r60 = 1, r30 = 0xd0c, r70 = 0xaabbccdd if r60 st32d(?8) r30 r70 [0xd04] 0xaa, [0xd05] 0xbb, [0xd06] 0xcc, [0xd07] 0xdd see also h_st32d st32 st16 st16d st8 st8d st32d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-166 8-bit store pseudo-op for h_st8d(0) syntax [ if r guard ] st8 r src1 r src2 function if r guard then mem[r src1 ] r src2 <7:0> attributes function unit dmem operation code 29 number of operands 2 modifier no modifier range ? latency n/a issue slots 4, 5 description the st8 operation is a pseudo operation transformed by the scheduler into an h_st8d(0) with the same arguments. (note: pseudo operations c annot be used in assembly files.) the st8 operation stores the least-significant 8-bit byte of r src2 into the memory location pointed to by the address in r src1 . this operation does not depend on the bytesex bit in the pcsw since only a single byte is stored. the result of an access by st8 to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the st8 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory lo cation (and the modification of cache if the location is cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, st8 has no side effects whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xd00, r80 = 0x44332211 st8 r10 r80 [0xd00] 0x11 r50 = 0, r20 = 0xd01, r70 = 0xaabbccdd if r50 st8 r20 r70 no change, since guard is false r60 = 1, r30 = 0xd02, r70 = 0xaabbccdd if r60 st8 r30 r70 [0xd02] 0xdd see also h_st8d st8d st16 st16d st32 st32d st8
pnx1300/01/02/11 data book philips semiconductors a-167 preliminary specification 8-bit store with displacement pseudo-op for h_st8d syntax [ if r guard ] st8d( d ) r src1 r src2 function if r guard then mem[r src1 + d ] r src2 <7:0> attributes function unit dmem operation code 29 number of operands 2 modifier 7 bits modifier range ?64..63 latency n/a issue slots 4, 5 description the st8d operation is a pseudo operation transformed by the scheduler into an h_st8d with the same arguments. (note: pseudo operations cannot be used in assembly files.) the st8d operation stores the least- significant 8-bit byte of r src2 into the memory locati on pointed to by the address formed from the sum r src1 + d . the value of the opcode modifier d must be in the range -64 and 63 inclusive. this operation does not depend on the bytesex bit in the pcsw since only a single byte is stored. the result of an access by st8d to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the st8d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the addressed memory loca tion (and the modification of cache if the location is cacheable). if the lsb of r guard is 1, the store takes effect. if the lsb of r guard is 0, st8d has no side effect s whatever; in particular, the lru and other status bits in the data cache are not affected. examples initial values operation result r10 = 0xd00, r80 = 0x44332211 st8d(3) r10 r80 [0xd03] 0x11 r50 = 0, r20 = 0xd01, r70 = 0xaabbccdd if r50 st8d(-4) r20 r70 no change, since guard is false r60 = 1, r30 = 0xd02, r70 = 0xaabbccdd if r60 st8d(-4) r30 r70 [0xcfe] 0xdd see also h_st8d st8 st16 st16d st32 st32d st8d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-168 select unsigned byte syntax [ if r guard ] ubytesel r src1 r src2 r dest function if r guard then { if r src2 = 0 then r dest zero_ext8to32(r src1 <7:0>) else if r src2 = 1 then r dest zero_ext8to32(r src1 <15:8>) else if r src2 = 2 then r dest zero_ext8to32(r src1 <23:15>) else if r src2 = 3 then r dest zero_ext8to32(r src1 <31:24>) } attributes function unit alu operation code 55 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description as shown below, the ubytesel operation selects one byte from the argument, r src1 , zero-extends the byte to 32 bits, and stores the result in r dest . the value of r src2 determines which byte is selected, with r src2 =0 selecting the lsb of r src1 and r src2 =3 selecting the msb of r src1 . if rsrc2 is not between 0 and 3 inclusive, the result of ubytesel is undefined. the ubytesel operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x44332211, r40 = 1 ubytesel r30 r40 r50 r50 0x00000022 r10 = 0, r60 = 0xddccbbaa, r70 = 2 if r10 ubytesel r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0xddccbbaa, r70 = 2 if r20 ubytesel r60 r70 r90 r90 0x000000cc r100 = 0xff ffff7f, r110 = 0 ubytesel r100 r110 r120 r120 0x0000007f 0 15 31 r src1 0 31 r src2 23 7 1 0 31 r dest 7 0 32 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 unsigned unsigned unsigned unsigned unsigned see also ibytesel sex8 packbytes ubytesel
pnx1300/01/02/11 data book philips semiconductors a-169 preliminary specification clip signed to unsigned syntax [ if r guard ] uclipi r src1 r src2 r dest function if r guard then r dest min(max(r src1 , 0), r src2 ) attributes function unit dspalu operation code 75 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the uclipi operation returns the value of r src1 clipped into the unsigned integer range 0 to r src2 , inclusive. the argument r src1 is considered a signed integer; r src2 is considered an unsigned integer. the uclipi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x80, r40 = 0x7f uclipi r30 r40 r50 r50 0x7f r10 = 0, r60 = 0x12345678, r70 = 0xabc if r10 uclipi r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x12345678, r70 = 0xabc if r20 uclipi r60 r70 r90 r90 0xabc r100 = 0x80000000, r110 = 0x3fffff uclipi r100 r110 r120 r120 0 see also iclipi uclipu imin imax uclipi
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-170 clip unsigned to unsigned syntax [ if r guard ] uclipu r src1 r src2 r dest function if r guard then { if rsrc1 > rsrc2 then r dest r src2 else r dest r src1 } attributes function unit dspalu operation code 76 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the uclipu operation returns the value of r src1 clipped into the unsigned integer range 0 to r src2 , inclusive. the arguments r src1 and r src2 are considered unsigned integers. the uclipu operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x80, r40 = 0x7f uclipu r30 r40 r50 r50 0x7f r10 = 0, r60 = 0x12345678, r70 = 0xabc if r10 uclipu r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x12345678, r70 = 0xabc if r20 uclipu r60 r70 r90 r90 0xabc r100 = 0x80000000, r110 = 0x3fffff uclipu r100 r110 r120 r120 0x3fffff see also iclipi uclipi imin imax uclipu
pnx1300/01/02/11 data book philips semiconductors a-171 preliminary specification unsigned compare equal pseudo-op for ieql syntax [ if r guard ] ueql r src1 r src2 r dest function if r guard then { if r src1 = r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 37 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ueql operation is a pseudo operation transformed by the scheduler into an ieql with the same arguments. (note: pseudo operations cannot be used in assembly files.) the ueql operation sets the destination register, r dest , to 1 if the first argument, r src1 , is equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ueql operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ueql r30 r40 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 3 if r10 ueql r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x1000 if r20 ueql r50 r60 r90 r90 1 r70 = 0x80000000, r40 = 4 ueql r70 r40 r100 r100 0 r70 = 0x80000000 ueql r70 r70 r110 r110 1 see also ieql ueqli igeq uneq ueql
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-172 unsigned compare e qual with immediate syntax [ if r guard ] ueqli( n ) r src1 r dest function if r guard then { if r src1 = n then r dest 1 else r dest 0 } attributes function unit alu operation code 38 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the ueqli operation sets the destination register, r dest , to 1 if the first argument, r src1 , is equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ueqli operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ueqli(2) r30 r80 r80 0 r30 = 3 ueqli(3) r30 r90 r90 1 r30 = 3 ueqli(4) r30 r100 r100 0 r10 = 0, r40 = 0x100 if r10 ueqli(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ueqli(63) r40 r100 r100 0 r60 = 0x07f ueqli(127) r60 r120 r120 1 see also ieqli ueql igeqi uneqi ueqli
pnx1300/01/02/11 data book philips semiconductors a-173 preliminary specification sum of products of unsigned 16-bit halfwords syntax [ if r guard ] ufir16 r src1 r src2 r dest function if r guard then r dest zero_ext16to32(r src1 <31:16>) zero_ext16to32(r src2 <31:16>) + zero_ext16to32(r src1 <15:0>) zero_ext16to32(r src2 <15:0>) attributes function unit dspmul operation code 94 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the ufir16 operation computes two separate products of the two pairs of corresponding 16-bit halfwords of r src1 and r src2 ; the two products are summed, an d the result is written to r dest . all halfwords are considered unsigned; thus, the interm ediate products and the final sum of products are unsigned. all intermediate computations are performed without loss of precision; the final sum of products is clipped into the range [0xffffffff..0] before being written into r dest . the ufir16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x00020003, r40 = 0x00010002 ufir16 r30 r40 r50 r50 8 r10 = 0, r60 = 0x80000064, r70 = 0x00648000 if r10 ufir16 r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x80000064, r70 = 0x00648000 if r20 ufir16 r60 r70 r90 r90 0x00640000 r30 = 0x00020003, r70 = 0x00648000 ufir16 r30 r70 r100 r100 0x000180c8 0 15 31 r src1 0 15 31 r src2 0 31 r dest + unsigned unsigned unsigned unsigned unsigned 0 32 clip to [2 32 ?1..0] full-precision 33-bit result unsigned see also ifir16 ifir8ii ifir8ui ufir8uu ufir16
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-174 unsigned sum of products of unsigned bytes syntax [ if r guard ] ufir8uu r src1 r src2 r dest function if r guard then r dest zero_ext8to32(r src1 <31:24>) zero_ext8to32(r src2 <31:24>) + zero_ext8to32(r src1 <23:16>) zero_ext8to32(r src2 <23:16>) + zero_ext8to32(r src1 <15:8>) zero_ext8to32(r src2 <15:8>) + zero_ext8to32(r src1 <7:0>) zero_ext8to32(r src2 <7:0>) attributes function unit dspmul operation code 90 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the ufir8uu operation computes four separate products of the four pairs of corresponding 8-bit bytes of r src1 and r src2 ; the four products are summed, and the result is written to r dest . all values are considered unsigned. all computations are pe rformed without loss of precision. the ufir8uu operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r70 = 0x0afb14f6, r30 = 0x0a0a1414 ufir8uu r70 r30 r90 r90 0x1efa r10 = 0, r70 = 0x0afb14f6, r30 = 0x0a0a1414 if r10 ufir8uu r70 r30 r100 no change, since guard is false r20 = 1, r80 = 0x649c649c, r40 = 0x9c649c64 if r20 ufir8uu r80 r40 r110 r110 0xf3c0 r50 = 0x80808080, r60 = 0xff ffffff ufir8uu r50 r60 r120 r120 0x1fe00 0 15 31 r src1 0 15 31 r src2 0 31 r dest + 23 7 23 7 unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned see also ifir8ui ifir8ii ifir16 ufir16 ufir8uu
pnx1300/01/02/11 data book philips semiconductors a-175 preliminary specification convert floating-point to unsigned integer using pcsw rounding mode syntax [ if r guard ] ufixieee r src1 r dest function if r guard then { r dest (unsigned long) ((float)r src1 ) } attributes function unit falu operation code 123 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufixieee operation converts the single-precision ieee floating-point value in r src1 to an unsigned integer and writes the result into r dest . rounding is according to the i eee rounding mode bits in pcsw. if r src1 is denormalized, zero is substituted before conversi on, and the ifz flag in the pcsw is set. if ufixieee causes an ieee exception, such as overflow or underflow, the co rresponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effec t of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other floating-point co mpute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous up dates ored with the existing pcsw value for that exception flag. the ufixieeeflags operation computes the exception flags that would result from an individual ufixieee . the ufixieee operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does no t affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0) ufixieee r30 r100 r100 3 r35 = 0x40247ae1 (2.57) ufixieee r35 r102 r102 3, inx flag set r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ufixieee r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ufixieee r40 r110 r110 0x0, inv flag set r45 = 0x7f800000 (+inf)) ufixieee r45 r112 r112 0xffffffff (2 32 -1), inv flag set r50 = 0xbfc147ae (-1.51) ufixieee r50 r115 r115 0, inv flag set r60 = 0x00400000 (5.877471754e-39) ufixieee r60 r117 r117 0, ifz set r70 = 0xffffffff (qnan) ufixieee r70 r120 r120 0, inv flag set r80 = 0xffbfffff (snan) ufixieee r80 r122 r122 0, inv flag set see also ifixieee ifixrz ufixrz ufixieee
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-176 ieee status flags from c onvert floating-point to unsigned integer using pcsw rounding mode syntax [ if r guard ] ufixieeeflags r src1 r dest function if r guard then r dest ieee_flags((unsi gned long) ((float)r src1 )) attributes function unit falu operation code 124 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufixieeeflags operation computes the ieee ex ceptions that would result from converting the single- precision ieee floating-point value in r src1 to an unsigned integer, and an integer bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pc sw are left unchanged by this operation. rounding is according to the ieee rounding mode bi ts in pcsw. if an argument is denor malized, zero is substitute d before computing the conversion, and the ifz bit in the result is set. the ufixieeeflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the de stination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0) ufixieeeflags r30 r100 r100 0 r35 = 0x40247ae1 (2.57) ufixieeeflags r35 r102 r102 0x02 (inx) r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ufixieeeflags r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ufixieeeflags r40 r110 r110 0x10 (inv) r45 = 0x7f800000 (+inf)) ufixieeeflags r45 r112 r112 0x10 (inv) r50 = 0xbfc147ae (-1.51) ufixieeeflags r50 r115 r115 0x10 (inv) r60 = 0x00400000 (5.877471754e-39) ufixieeeflags r60 r117 r117 0x20 (ifz) r70 = 0xffffffff (qnan) ufixieeeflags r70 r120 r120 0x10 (inv) r80 = 0xffbfffff (snan) ufixieeeflags r80 r122 r122 0x10 (inv) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ufixieee ifixieeeflags ifixrzflags ufixrzflags ufixieeeflags
pnx1300/01/02/11 data book philips semiconductors a-177 preliminary specification convert floating-point to unsigned integer with round toward zero syntax [ if r guard ] ufixrz r src1 r dest function if r guard then { r dest (unsigned long) ((float)r src1 ) } attributes function unit falu operation code 125 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufixrz operation converts th e single-precision ieee fl oating-point value in r src1 to an unsigned integer and writes the result into r dest . rounding toward zero is per formed; the ieee rounding mode bits in pcsw are ignored. this is the preferred roundi ng mode for ansi c. if r src1 is denormalized, zero is substituted before conversion, and the ifz flag in the pcsw is set. if ufixrz causes an ieee exception, such as overflow or underflow, the corresponding exception flags in the pcsw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floating-point operation but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other fl oating-point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the ufixrzflags operation computes the exception flags that would result from an individual ufixrz . the ufixrz operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does no t affect the exception flags in pcsw. examples initial values operation result r30 = 0x40400000 (3.0) ufixrz r30 r100 r100 3 r35 = 0x40247ae1 (2.57) ufixrz r35 r102 r102 2, inx flag set r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ufixrz r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ufixrz r40 r110 r110 0x0, inv flag set r45 = 0x7f800000 (+inf)) ufixrz r45 r112 r112 0xffffffff (2 32 -1), inv flag set r50 = 0xbfc147ae (-1.51) ufixrz r50 r115 r115 0, inv flag set r60 = 0x00400000 (5.877471754e-39) ufixrz r60 r117 r117 0, ifz set r70 = 0xffffffff (qnan) ufixrz r70 r120 r120 0, inv flag set r80 = 0xffbfffff (snan) ufixrz r80 r122 r122 0, inv flag set see also ifixieee ufixieee ifixrz ufixrz
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-178 ieee status flags from c onvert floating-point to unsigned integer with round toward zero syntax [ if r guard ] ufixrzflags r src1 r dest function if r guard then r dest ieee_flags((unsi gned long) ((float)r src1 )) attributes function unit falu operation code 126 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufixrzflags operation computes the ieee exceptions that would result from converting the single-precision ieee floating-point value in r src1 to an unsigned integer, and an inte ger bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. rounding toward zero is performed; the ieee rounding mode bits in pcsw are ignored. if an argument is denormalized, zero is substituted before computing the conversion, and the ifz bit in the result is set. the ufixrzflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x40400000 (3.0) ufixrzflags r30 r100 r100 0 r35 = 0x40247ae1 (2.57) ufixrzflags r35 r102 r102 0x02 (inx) r10 = 0, r40 = 0xff4fffff (?3. 402823466e+38) if r10 ufixrzflags r40 r105 no change, since guard is false r20 = 1, r40 = 0xff4fffff (?3. 402823466e+38) if r20 ufixrzflags r40 r110 r110 0x10 (inv) r45 = 0x7f800000 (+inf)) ufixrzflags r45 r112 r112 0x10 (inv) r50 = 0xbfc147ae (-1.51) ufixrzflags r50 r115 r115 0x10 (inv) r60 = 0x00400000 (5.877471754e-39) ufixrzflags r60 r117 r117 0x20 (ifz) r70 = 0xffffffff (qnan) ufixrzflags r70 r120 r120 0x10 (inv) r80 = 0xffbfffff (snan) ufixrzflags r80 r122 r122 0x10 (inv) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ufixrz ifixrzflags ifixieeeflags ufixieeeflags ufixrzflags
pnx1300/01/02/11 data book philips semiconductors a-179 preliminary specification convert unsigned integer to floating-point syntax [ if r guard ] ufloat r src1 r dest function if r guard then { r dest (float) ((unsigned long)r src1 ) } attributes function unit falu operation code 127 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufloat operation converts the unsigned integer value in r src1 to single-precision ieee floating-point format and writes the result into r dest . rounding is according to the i eee rounding mode bits in pcsw. if ufloat causes an ieee exception, such as inexact, the correspon ding exception flags in the pc sw are set. the pcsw exception flags are sticky: the flags can be set as a side-effect of any floating-point oper ation but can only be reset by an explicit writepcsw operation. the update of the pcsw except ion flags occurs at the same time as r dest is written. if any other floating-point compute op erations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with th e existing pcsw value fo r that exception flag. the ufloatflags operation computes the exception flags that would result from an individual ufloat . the ufloat operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does no t affect the exception flags in pcsw. examples initial values operation result r30 = 3 ufloat r30 r100 r100 0x40400000 (3.0) r40 = 0xffffffff ( 4294967295) ufloat r40 r105 r105 0x4f800000 (4.294967296e+9), inx flag set r10 = 0, r50 = 0xfffffffd if r10 ufloat r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ufloat r50 r115 r115 0x4f800000 (4.294967296e+9), inx flag set r60 = 0x7fffffff ( 2147483647) ufloat r60 r117 r117 0x4f000000 (2.147483648e+9), inx flag set r70 = 0x80000000 (2147483648) ufloat r70 r120 r120 0x4f000000 (2.147483648e+9) r80 = 0x7ffffff1 ( 2147483633) ufloat r80 r122 r122 0x4f000000 (2.147483648e+9), inx flag set see also ifloat ifloatrz ufloatrz ifixieee ufloatflags ufloat
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-180 ieee status flags from convert unsigned integer to floating-point syntax [ if r guard ] ufloatflags r src1 r dest function if r guard then r dest ieee_flags((float) ((unsigned long)r src1 )) attributes function unit falu operation code 128 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufloatflags operation computes the ieee exceptions that would result from converti ng the unsigned integer in r src1 to a single-precision ieee floating-point value, and an integer bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. rounding is according to the ieee rounding mode bits in pcsw. the ufloatflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinati on register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ufloatflags r30 r100 r100 0 r40 = 0xffffffff ( 4294967295) ufloatflags r40 r105 r105 0x02 (inx) r10 = 0, r50 = 0xfffffffd if r10 ufloatflags r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ufloatflags r50 r115 r115 0x02 (inx) r60 = 0x7fffffff ( 2147483647) ufloatflags r60 r117 r117 0x02 (inx) r70 = 0x80000000 (2147483648) ufloatflags r70 r120 r120 0 r80 = 0x7ffffff1 ( 2147483633) ufloatflags r80 r122 r122 0x02 (inx) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ufloat ifloatflags ifloatrzflags ufloatrzflags ufloatflags
pnx1300/01/02/11 data book philips semiconductors a-181 preliminary specification convert unsigned integer to floating-point with rounding toward zero syntax [ if r guard ] ufloatrz r src1 r dest function if r guard then { r dest (float) ((unsigned long)r src1 ) } attributes function unit falu operation code 119 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufloatrz operation converts the unsigned integer value in r src1 to single-precision ieee floating-point format and writes the result into r dest . rounding is performed to ward zero; the ieee rounding mode bits in pcsw are ignored. this is the preferred rounding mode for ansi c. if ufloatrz causes an ieee exception, such as inexact, the corresponding exception flags in t he pcsw are set. the pcsw exception fl ags are sticky: the flags can be set as a side-effect of any floating-point operat ion but can only be reset by an explicit writepcsw operation. the update of the pcsw exception flags occurs at the same time as r dest is written. if any other fl oating-point compute operations update the pcsw at the same time, the net result in each exception flag is the logical or of all simultaneous updates ored with the existing pcsw value for that exception flag. the ufloatrzflags operation computes the exception flags that would result from an individual ufloatrz . the ufloatrz operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest and the exception flags in pcsw are written; otherwise, r dest is not changed and the operation does no t affect the exception flags in pcsw. examples initial values operation result r30 = 3 ufloatrz r30 r100 r100 0x40400000 (3.0) r40 = 0xffffffff ( 4294967295) ufloatrz r40 r105 r105 0x4f7fffff (4. 294967040e+9), inx flag set r10 = 0, r50 = 0xfffffffd if r10 ufloatrz r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ufloatrz r50 r115 r115 0x4f7fffff (4. 294967040e+9), inx flag set r60 = 0x7fffffff ( 2147483647) ufloatrz r60 r117 r117 0x4effffff (2. 147483520e+9), inx flag set r70 = 0x80000000 (2147483648) ufloatrz r70 r120 r120 0x4f000000 (2.147483648e+9) r80 = 0x7ffffff1 ( 2147483633) ufloatrz r80 r122 r122 0x4effffff (2.147483520e+9), inx flag set see also ifloatrz ifloat ufloat ifixieee ufloatflags ufloatrz
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-182 ieee status flags from convert unsigned integer to floating-point with rounding toward zero syntax [ if r guard ] ufloatrzflags r src1 r dest function if r guard then r dest ieee_flags((float) ((unsigned long)r src1 )) attributes function unit falu operation code 120 number of operands 1 modifier no modifier range ? latency 3 issue slots 1, 4 description the ufloatrzflags operation computes the ieee exceptions that would result from converting the unsigned integer in r src1 to a single-precision ieee floating-point value, and an integer bit vector representing the computed exception flags is written into r dest . the bit vector stored in r dest has the same format as the ieee exception bits in the pcsw. the exception flags in pcsw are left unchanged by this operation. rounding is performed toward zero; the ieee rounding mode bits in pcsw are ignored. the ufloatrzflags operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the de stination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ufloatrzflags r30 r100 r100 0 r40 = 0xffffffff ( 4294967295) ufloatrzflags r40 r105 r105 0x02 (inx) r10 = 0, r50 = 0xfffffffd if r10 ufloatrzflags r50 r110 no change, since guard is false r20 = 1, r50 = 0xfffffffd if r20 ufloatrzflags r50 r115 r115 0x02 (inx) r60 = 0x7fffffff ( 2147483647) ufloatrzflags r60 r117 r117 0x02 (inx) r70 = 0x80000000 (2147483648) ufloatrzflags r70 r120 r120 0 r80 = 0x7ffffff1 ( 2147483633) ufloatrzflags r80 r122 r122 0x02 (inx) ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 31 00 see also ufloatrz ifloatflags ufloatflags ifloatrzflags ufloatrzflags
pnx1300/01/02/11 data book philips semiconductors a-183 preliminary specification unsigned compare greater or equal syntax [ if r guard ] ugeq r src1 r src2 r dest function if r guard then { if (unsigned)r src1 >= (unsigned)r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 35 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ugeq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than or equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ugeq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ugeq r30 r40 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 3 if r10 ugeq r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 ugeq r50 r60 r90 r90 1 r70 = 0x80000000, r40 = 4 ugeq r70 r40 r100 r100 1 r70 = 0x80000000 ugeq r70 r70 r110 r110 1 see also igeq ugeqi ugeq
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-184 unsigned compare greater or equal with immediate syntax [ if r guard ] ugeqi( n ) r src1 r dest function if r guard then { if (unsigned)r src1 >= (unsigned) n then r dest 1 else r dest 0 } attributes function unit alu operation code 36 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the ugeqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than or equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ugeqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ugeqi(2) r30 r80 r80 1 r30 = 3 ugeqi(3) r30 r90 r90 1 r30 = 3 ugeqi(4) r30 r100 r100 0 r10 = 0, r40 = 0x100 if r10 ugeqi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ugeqi(63) r40 r100 r100 1 r60 = 0x80000000 ugeqi(127) r60 r120 r120 1 see also ugeq igeqi ugeqi
pnx1300/01/02/11 data book philips semiconductors a-185 preliminary specification unsigned compare greater syntax [ if r guard ] ugtr r src1 r src2 r dest function if r guard then { if (unsigned)r src1 > (unsigned)r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 33 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ugtr operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater th an the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ugtr operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ugtr r30 r40 r80 r80 0 r10 = 0, r60 = 0x100, r30 = 3 if r10 ugtr r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 ugtr r50 r60 r90 r90 1 r70 = 0x80000000, r40 = 4 ugtr r70 r40 r100 r100 1 r70 = 0x80000000 ugtr r70 r70 r110 r110 0 see also igtr ugtri ugtr
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-186 unsigned compare greater with immediate syntax [ if r guard ] ugtri( n ) r src1 r dest function if r guard then { if (unsigned)r src1 > (unsigned) n then r dest 1 else r dest 0 } attributes function unit alu operation code 34 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the ugeqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is greater than the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ugeqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ugtri(2) r30 r80 r80 1 r30 = 3 ugtri(3) r30 r90 r90 0 r30 = 3 ugtri(4) r30 r100 r100 0 r10 = 0, r40 = 0x100 if r10 ugtri(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ugtri(63) r40 r100 r100 1 r60 = 0x80000000 ugtri(127) r60 r120 r120 1 see also igtri ugtr ugtri
pnx1300/01/02/11 data book philips semiconductors a-187 preliminary specification unsigned immediate syntax uimm( n ) r dest function r dest n attributes function unit const operation code 191 number of operands 0 modifier 32 bits modifier range 0..0x ffffffff latency 1 issue slots 1, 2, 3, 4, 5 description the uimm operation writes the unsigne d 32-bit opcode modifier n into r dest . note: this operation is not guarded. examples initial values operation result uimm(2) r10 r10 2 uimm(0x100) r20 r20 0x100 uimm(0xfffc0000) r30 r30 0xfffc0000 see also iimm uimm
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-188 unsigned 16-bit load pseudo-op for uld16d(0) syntax [ if r guard ] uld16 r src1 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[r src1 + (1 bs)] temp<15:8> mem[r src1 + (0 bs)] r dest zero_ext16to32(temp<15:0>) } attributes function unit dmem operation code 197 number of operands 1 modifier no modifier range ? latency 3 issue slots 4, 5 description the uld16 operation is a pseudo operation tr ansformed by the scheduler into an uld16d(0) with the same argument. (note: pseudo oper ations cannot be used in assembly source files.) the uld16 operation loads the 16-bit memory value from the address contained in r src1 , zero extends it to 32 bits, and writes the result in r dest . if the memory address contained in r src1 is not a multiple of 2, the result of uld16 is undefined but no exception will be raised. this load operation is performed as lit tle-endian or big- endian depending on the current setting of the bytesex bit in the pcsw. the result of an access by uld16 to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and uld16 has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd00] = 0x22, [0xd01] = 0x11 uld16 r10 r60 r60 0x00002211 r30 = 0, r20 = 0xd04, [0xd04] = 0x84, [0xd05] = 0x33 if r30 uld16 r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd04] = 0x84, [0xd05] = 0x33 if r40 uld16 r20 r80 r80 0x00008433 r50 = 0xd01 uld16 r50 r90 r90 undefined (0xd01 is not a multiple of 2) see also uld16d ild16 ild16d uld16r ild16r uld16x ild16x uld16
pnx1300/01/02/11 data book philips semiconductors a-189 preliminary specification unsigned 16-bit load with displacement syntax [ if r guard ] uld16d( d ) r src1 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[r src1 + d + (1 bs)] temp<15:8> mem[r src1 + d + (0 bs)] r dest zero_ext16to32(temp<15:0>) } attributes function unit dmem operation code 197 number of operands 1 modifier 7 bits modifier range ?128..126 by 2 latency 3 issue slots 4, 5 description the uld16d operation loads the 16-bit memory value from the address computed by r src1 + d , zero extends it to 32 bits, and writes the result in r dest . the d value is an opcode modifier, must be in the range ?128 and 126 inclusive, and must be a multiple of 2. if the memory address computed by r src1 + d is not a multiple of 2, the result of uld16d is undefined but no exception will be raised. this load operation is performe d as little-endian or big-endian depending on the current setting of the bytesex bit in the pcsw. the result of an access by uld16d to the mmio address aperture is und efined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld16d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addr essed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and uld16d has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd02] = 0x22, [0xd03] = 0x11 uld16d(2) r10 r60 r60 0x00002211 r30 = 0, r20 = 0xd04, [0xd00] = 0x84, [0xd01] = 0x33 if r30 uld16d(-4) r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd00] = 0x84, [0xd01] = 0x33 if r40 uld16d(-4) r20 r80 r80 0x00008433 r50 = 0xd01 uld16d(-4) r50 r90 r90 undefined (0xd01 +(?4) is not a multiple of 2) see also uld16 ild16 ild16d uld16r ild16r uld16x ild16x uld16d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-190 unsigned 16-bit load with index syntax [ if r guard ] uld16r r src1 r src2 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[r src1 + r src2 + (1 bs)] temp<15:8> mem[r src1 + r src2 + (0 bs)] r dest zero_ext16to32(temp<15:0>) } attributes function unit dmem operation code 198 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the uld16r operation loads the 16-bit memory value from the address computed by r src1 + r src2 , zero extends it to 32 bits, and writes the result in r dest . if the memory address computed by r src1 + r src2 is not a multiple of 2, the result of uld16r is undefined but no exception will be raised. this load operation is performed as little-endian or big- endian depending on the current setting of the bytesex bit in the pcsw. the result of an access by uld16r to the mmio address aperture is unde fined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld16r operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and uld16r has no side effects whatever. examples initial values operation result r10 = 0xd00, r20 = 2, [0xd02] = 0x22, [0xd03] = 0x11 uld16r r10 r20 r80 r80 0x00002211 r50 = 0, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84, [0xd01] = 0x33 if r50 uld16r r40 r30 r90 no change, since guard is false r60 = 1, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84, [0xd01] = 0x33 if r60 uld16r r40 r30 r100 r100 0x00008433 r70 = 0xd01, r30 = 0xfff ffffc uld16r r70 r30 r110 r110 undefined (0xd01 +(?4) is not a multiple of 2) see also uld16 ild16 uld16d ild16d ild16r uld16x ild16x uld16r
pnx1300/01/02/11 data book philips semiconductors a-191 preliminary specification unsigned 16-bit load with scaled index syntax [ if r guard ] uld16x r src1 r src2 r dest function if r guard then { if pcsw.bytesex = little_endian then bs 1 else bs 0 temp<7:0> mem[r src1 + (2 r src2 ) + (1 bs)] temp<15:8> mem[r src1 + (2 r src2 ) + (0 bs)] r dest zero_ext16to32(temp<15:0>) } attributes function unit dmem operation code 199 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the uld16x operation loads the 16-bit memory value from the address computed by r src1 + 2 r src2 , zero extends it to 32 bits, and writes the result in r dest . if the memory address computed by r src1 + 2 r src2 is not a multiple of 2, the result of uld16x is undefined but no ex ception will be raised. this load operation is perf ormed as little-endian or big-endian depending on the current se tting of the bytesex bit in the pcsw. the result of an access by uld16x to the mmio address aperture is und efined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld16x operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addr essed locations are cacheable. if the lsb of r guard is 0, r dest is not changed and uld16x has no side effects whatever. examples initial values operation result r10 = 0xd00, r30 = 1, [0xd02] = 0x22, [0xd03] = 0x11 uld16x r10 r30 r100 r100 0x00002211 r50 = 0, r40 = 0xd04, r20 = 0x fffffffe, [0xd00] = 0x84, [0xd01] = 0x33 if r50 uld16x r40 r20 r80 no change, since guard is false r60 = 1, r40 = 0xd04, r20 = 0x fffffffe, [0xd00] = 0x84, [0xd01] = 0x33 if r60 uld16x r40 r20 r90 r90 0x00008433 r70 = 0xd01, r30 = 1 uld16x r70 r30 r110 r110 undefined (0xd01 + 2 1 is not a multi- ple of 2) see also uld16 ild16 uld16d ild16d uld16r ild16r ild16x uld16x
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-192 unsigned 8-bit load pseudo-op for uld8d(0) syntax [ if r guard ] uld8 r src1 r dest function if r guard then r dest zero_ext8to32(mem[r src1 ]) attributes function unit dmem operation code 8 number of operands 1 modifier no modifier range ? latency 3 issue slots 4, 5 description the uld8 operation is a pseudo operation transformed by the scheduler into an uld8d(0) with the same argument. (note: pseudo oper ations cannot be used in assembly source files.) the uld8 operation loads the 8-bit memory value from the address contained in r src1 , zero extends it to 32 bits, and writes the result in r dest . this operation does not depend on the bytese x bit in the pcsw since only a single byte is loaded. the result of an access by uld8 to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld8 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed location is cacheable. if the lsb of r guard is 0, r dest is not changed and uld8 has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd00] = 0x22 uld8 r10 r60 r60 0x00000022 r30 = 0, r20 = 0xd04, [0xd04] = 0x84 if r30 uld8 r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd04] = 0x84 if r40 uld8 r20 r80 r80 0x00000084 r50 = 0xd01, [0xd01] = 0x33 uld8 r50 r90 r90 0x00000033 see also ild8 uld8d ild8d uld8r ild8r uld8
pnx1300/01/02/11 data book philips semiconductors a-193 preliminary specification unsigned 8-bit load with displacement syntax [ if r guard ] uld8d( d ) r src1 r dest function if r guard then r dest zero_ext8to32(mem[r src1 + d ]) attributes function unit dmem operation code 8 number of operands 1 modifier 7 bits modifier range ?64..63 latency 3 issue slots 4, 5 description the uld8d operation loads the 8-bit memory value from the address computed by r src1 + d , zero extends it to 32 bits, and writes the result in r dest . the d value is an opcode modifier in the range ?64 to 63 inclusive. this operation does not depend on the bytesex bit in the pc sw since only a single byte is loaded. the result of an access by uld8d to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld8d operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and th e occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed location is cacheable. if the lsb of r guard is 0, r dest is not changed and uld8d has no side effects whatever. examples initial values operation result r10 = 0xd00, [0xd02] = 0x22 uld8d(2) r10 r60 r60 0x000022 r30 = 0, r20 = 0xd04, [0xd00] = 0x84 if r30 uld8d(-4) r20 r70 no change, since guard is false r40 = 1, r20 = 0xd04, [0xd00] = 0x84 if r40 uld8d(-4) r20 r80 r80 0x00000084 r50 = 0xd05, [0xd01] = 0x33 uld8d(-4) r50 r90 r90 0x00000033 see also uld8 ild8 ild8d uld8r ild8r uld8d
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-194 unsigned 8-bit load with index syntax [ if r guard ] uld8r r src1 r src2 r dest function if r guard then r dest zero_ext8to32(mem[r src1 + r src2 ]) attributes function unit dmem operation code 194 number of operands 2 modifier no modifier range ? latency 3 issue slots 4, 5 description the uld8r operation loads the 8-bit memory va lue from the address computed by r src1 + r src2 , zero extends it to 32 bits, and writes the result in r dest . this operation does not depend on the bytesex bit in the pcsw since only a single byte is loaded. the result of an access by uld8r to the mmio address aperture is undefined; access to the mmio aperture is defined only for 32-bit loads and stores. the uld8r operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register and t he occurrence of side effects. if the lsb of r guard is 1, r dest is written and the data cache status bits are updated if the addressed location is cacheable. if the lsb of r guard is 0, r dest is not changed and uld8r has no side effects whatever. examples initial values operation result r10 = 0xd00, r20 = 2, [0xd02] = 0x22 uld8r r10 r20 r80 r80 0x00000022 r50 = 0, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84 if r50 uld8r r40 r30 r90 no change, since guard is false r60 = 1, r40 = 0xd04, r30 = 0x fffffffc, [0xd00] = 0x84 if r60 uld8r r40 r30 r100 r100 0x00000084 r70 = 0xd05, r30 = 0xfff ffffc, [0xd01] = 0x33 uld8r r70 r30 r110 r110 0x00000033 see also uld8 ild8 uld8d ild8d ild8r uld8r
pnx1300/01/02/11 data book philips semiconductors a-195 preliminary specification unsigned compare less or equal pseudo-op for ugeq syntax [ if r guard ] uleq r src1 r src2 r dest function if r guard then { if (unsigned)r src1 <= (unsigned)r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 35 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the uleq operation is a pseudo operation transformed by the scheduler into an ugeq with the arguments exchanged ( uleq ?s r src1 is ugeq ?s r src2 and vice versa). (note: pseudo ope rations cannot be used in assembly source files.) the uleq operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than or equal to the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the uleq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 uleq r30 r40 r80 r80 1 r10 = 0, r60 = 0x100, r30 = 3 if r10 uleq r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 uleq r50 r60 r90 r90 0 r70 = 0x80000000, r40 = 4 uleq r70 r40 r100 r100 0 r70 = 0x80000000 uleq r70 r70 r110 r110 1 see also ileq uleqi uleq
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-196 unsigned compare less or equal with immediate syntax [ if r guard ] uleqi( n ) r src1 r dest function if r guard then { if (unsigned)r src1 <= (unsigned) n then r dest 1 else r dest 0 } attributes function unit alu operation code 43 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the uleqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than or equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the uleqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 uleqi(2) r30 r80 r80 0 r30 = 3 uleqi(3) r30 r90 r90 1 r30 = 3 uleqi(4) r30 r100 r100 1 r10 = 0, r40 = 0x100 if r10 uleqi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 uleqi(63) r40 r100 r100 0 r60 = 0x80000000 uleqi(127) r60 r120 r120 0 see also uleq ileqi uleqi
pnx1300/01/02/11 data book philips semiconductors a-197 preliminary specification unsigned compare less pseudo-op for ugtr syntax [ if r guard ] ules r src1 r src2 r dest function if r guard then { if (unsigned)r src1 < (unsigned)r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 33 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the ules operation is a pseudo operation transformed by the scheduler into an ugtr with the arguments exchanged ( ules ?s r src1 is ugtr ?s r src2 and vice versa). (note: pseudo ope rations cannot be used in assembly source files.) the ules operation sets the de stination register, r dest , to 1 if the first argument, r src1 , is less than the second argument, r src2 ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ules operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 ules r30 r40 r80 r80 1 r10 = 0, r60 = 0x100, r30 = 3 if r10 ules r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x100 if r20 ules r50 r60 r90 r90 0 r70 = 0x80000000, r40 = 4 ules r70 r40 r100 r100 0 r70 = 0x80000000 ules r70 r70 r110 r110 0 see also iles ugtr ules
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-198 unsigned compare less with immediate syntax [ if r guard ] ulesi( n ) r src1 r dest function if r guard then { if (unsigned)r src1 < (unsigned) n then r dest 1 else r dest 0 } attributes function unit alu operation code 41 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the ulesi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is less than the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the ulesi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 ulesi(2) r30 r80 r80 0 r30 = 3 ulesi(3) r30 r90 r90 0 r30 = 3 ulesi(4) r30 r100 r100 1 r10 = 0, r40 = 0x100 if r10 ulesi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 ulesi(63) r40 r100 r100 0 r60 = 0x80000000 ulesi(127) r60 r120 r120 0 see also ules ilesi ulesi
pnx1300/01/02/11 data book philips semiconductors a-199 preliminary specification unsigned sum of absolute values of signed 8-bit differences syntax [ if r guard ] ume8ii r src1 r src2 r dest function if r guard then r dest abs_val(sign_ext8to32(r src1 <31:24>) ? sign_ext8to32(r src2 <31:24>)) + abs_val(sign_ext8to32(r src1 <23:16>) ? sign_ext8to32(r src2 <23:16>)) + abs_val(sign_ext8to32(r src1 <15:8>) ? sign_ext8to32(r src2 <15:8>)) + abs_val(sign_ext8to32(r src1 <7:0>) ? sign_ext8to32(r src2 <7:0>)) attributes function unit dspalu operation code 64 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the ume8ii operation computes four separate differ ences of the four pairs of corresponding signed 8-bit bytes of r src1 and r src2 ; the absolute values of the four differenc es are summed, and the sum is written to r dest . all computations are performed without loss of precision. the ume8ii operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r80 = 0x0a14f6f6, r30 = 0x1414ecf6 ume8ii r80 r30 r100 r100 0x14 r10 = 0, r80 = 0x0a14f6f6, r30 = 0x1414ecf6 if r10 ume8ii r80 r30 r70 no change, since guard is false r20 = 1, r90 = 0x64649c9c, r40 = 0x649c649c if r20 ume8ii r90 r40 r110 r110 0x190 r40 = 0x649c649c, r90 = 0x64649c9c ume8ii r40 r90 r120 r120 0x190 r50 = 0x80808080, r60 = 0x7f7f7f7f ume8ii r50 r60 r125 r125 0x3fc 0 15 31 r src1 0 15 31 r src2 0 31 r dest ? ? + ? ? | | | | | | | | 23 7 23 7 signed signed signed signed signed signed signed signed unsigned see also ume8uu ume8ii
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-200 sum of absolute values of unsigned 8-bit differences syntax [ if r guard ] ume8uu r src1 r src2 r dest function if r guard then r dest abs_val(zero_ext8to32(r src1 <31:24>) ? zero_ext8to32(r src2 <31:24>)) + abs_val(zero_ext8to32(r src1 <23:16>) ? zero_ext8to32(r src2 <23:16>)) + abs_val(zero_ext8to32(r src1 <15:8>) ? zero_ext8to32(r src2 <15:8>)) + abs_val(zero_ext8to32(r src1 <7:0>) ? zero_ext8to32(r src2 <7:0>)) attributes function unit dspalu operation code 26 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description as shown below, the ume8uu operation computes four separate differ ences of the four pairs of corresponding unsigned 8-bit bytes of r src1 and r src2 . the absolute values of the four di fferences are summed and the result is written to r dest . all computations are performed without loss of precision. the ume8uu operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r80 = 0x0a14f6f6, r30 = 0x1414ecf6 ume8uu r80 r30 r100 r100 0x14 r10 = 0, r80 = 0x0a14f6f6, r30 = 0x1414ecf6 if r10 ume8uu r80 r30 r70 no change, since guard is false r20 = 1, r90 = 0x64649c9c, r40 = 0x649c649c if r20 ume8uu r90 r40 r110 r110 0x70 r40 = 0x649c649c, r90 = 0x64649c9c ume8uu r40 r90 r120 r120 0x70 r50 = 0x80808080, r60 = 0x7f7f7f7f ume8uu r50 r60 r125 r125 0x4 0 15 31 r src1 0 15 31 r src2 0 31 r dest ? ? + ? ? | | | | | | | | 23 7 23 7 unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned see also ume8ii ume8uu
pnx1300/01/02/11 data book philips semiconductors a-201 preliminary specification umin minimum of unsigned values pseudo-op for uclipu syntax [ if r guard ] umin r src1 r src2 r dest function if r guard then { if rsrc1 > rsrc2 then r dest r src2 else r dest r src1 } attributes function unit dspalu operation code 76 number of operands 2 modifier no modifier range ? latency 2 issue slots 1, 3 description the umin operation returns the minimum value of r src1 and r src2 . the arguments r src1 and r src2 are considered unsigned integers. the umin operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0x80, r40 = 0x7f umin r30 r40 r50 r50 0x7f r10 = 0, r60 = 0x12345678, r70 = 0xabc if r10 umin r60 r70 r80 no change, since guard is false r20 = 1, r60 = 0x12345678, r70 = 0xabc if r20 umin r60 r70 r90 r90 0xabc r100 = 0x80000000, r110 = 0x3fffff umin r100 r110 r120 r120 0x3fffff see also iclipi uclipi imin imax
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-202 unsigned multiply syntax [ if r guard ] umul r src1 r src2 r dest function if r guard then temp zero_ext32to64(r src1 ) zero_ext32to64(r src2 ) r dest temp<31:0> attributes function unit ifmul operation code 138 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the umul operation computes the product r src1 r src2 and writes the least-significant 32 bits of the full 64-bit product into r dest . the operands are considered unsigned intege rs. no overflow or underflow detection is performed. the umul operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0x100 umul r60 r60 r80 r80 0x10000 r10 = 0, r60 = 0x100, r30 = 0xf11 if r10 umul r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x100, r30 = 0xf11 if r20 umul r60 r30 r90 r90 0xf1100 r70 = 0x100, r40 = 0xffffff9c umul r70 r40 r100 r100 0xffff9c00 0 31 r src1 0 31 r src2 0 31 r dest 0 63 31 64-bit result unsigned unsigned unsigned unsigned see also imul imulm umulm dspimul dspumul dspidualmul quadumulmsb fmul umul
pnx1300/01/02/11 data book philips semiconductors a-203 preliminary specification unsigned multiply, return most-significant 32 bits syntax [ if r guard ] umulm r src1 r src2 r dest function if r guard then temp zero_ext32to64(r src1 ) zero_ext32to64(r src2 ) r dest temp<63:32> attributes function unit ifmul operation code 140 number of operands 2 modifier no modifier range ? latency 3 issue slots 2, 3 description as shown below, the umulm operation computes the product r src1 r src2 and writes the most-significant 32 bits of the 64-bit product into r dest . the operands are considered unsigned integers. the umulm operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r60 = 0x10000 umulm r60 r60 r80 r80 0x00000001 r10 = 0, r60 = 0x100, r30 = 0xf11 if r10 umulm r60 r30 r50 no change, since guard is false r20 = 1, r60 = 0x10001000, r30 = 0xf1100000 if r20 umulm r60 r30 r90 r90 0xf110f11 r70 = 0xffffff00, r40 = 0x100 umulm r70 r40 r100 r100 0xff 0 31 r src1 0 31 r src2 0 31 r dest 0 63 31 64-bit result unsigned unsigned unsigned unsigned see also umulm dspimul dspumul dspidualmul quadumulmsb fmul umulm
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-204 unsigned compare not equal pseudo-op for ineq syntax [ if r guard ] uneq r src1 r src2 r dest function if r guard then { if r src1 != r src2 then r dest 1 else r dest 0 } attributes function unit alu operation code 39 number of operands 2 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the uneq operation is a pseudo operation transformed by the scheduler into an ineq . (note: pseudo operations cannot be used in assembly source files.) the uneq operation sets the destination register, r dest , to 1 if the two arguments, r src1 and r src2 , are not equal; otherwise, r dest is set to 0. the uneq operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3, r40 = 4 uneq r30 r40 r80 r80 1 r10 = 0, r60 = 0x1000, r30 = 3 if r10 uneq r60 r30 r50 no change, since guard is false r20 = 1, r50 = 0x1000, r60 = 0x1000 if r20 uneq r50 r60 r90 r90 0 r70 = 0x80000000, r40 = 4 uneq r70 r40 r100 r100 1 r70 = 0x80000000 uneq r70 r70 r110 r110 0 see also ineq igtr uneqi uneq
pnx1300/01/02/11 data book philips semiconductors a-205 preliminary specification unsigned compare not equal with immediate syntax [ if r guard ] uneqi( n ) r src1 r dest function if r guard then { if (unsigned)r src1 != (unsigned) n then r dest 1 else r dest 0 } attributes function unit alu operation code 40 number of operands 1 modifier 7 bits modifier range 0..127 latency 1 issue slots 1, 2, 3, 4, 5 description the uneqi operation sets the destination register, r dest , to 1 if the first argument, r src1 , is not equal to the opcode modifier, n ; otherwise, r dest is set to 0. the arguments are treated as unsigned integers. the uneqi operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 3 uneqi(2) r30 r80 r80 1 r30 = 3 uneqi(3) r30 r90 r90 0 r30 = 3 uneqi(4) r30 r100 r100 1 r10 = 0, r40 = 0x100 if r10 uneqi(63) r40 r50 no change, since guard is false r20 = 1, r40 = 0x100 if r20 uneqi(63) r40 r100 r100 1 r60 = 0x80000000 uneqi(127) r60 r120 r120 1 see also uneq ineqi ueqli uneqi
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-206 write destination program counter syntax [ if r guard ] writedpc r src1 function if r guard then { dpc r src1 } attributes function unit fcomp operation code 160 number of operands 1 modifier no modifier range ? latency 1 issue slots 3 description the writedpc copies the value of r src1 to the dpc (destination program counter) processor register. whenever a hardware update (during an interruptible jump) and a software update (through a writedpc ) coincide, the software update takes precedence. interruptible jumps write their target address to the dpc. t he value of dpc is intended to be used by an exception- handling routine as a jump address to resume execution of the program that was running before the exception was taken. the writedpc operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of dpc. if the lsb of r guard is 1, dpc is written; otherwise, dpc is unchanged. examples initial values operation result r30 = 0xbeebee writedpc r30 dpc 0xbeebee r20 = 0, r31 = 0xabba if r20 writedpc r31 no change, since guard is false r21 = 1, r31 = 0xabba if r21 writedpc r31 dpc 0xabba see also readdpc writespc ijmpf ijmpi ijmpt writedpc
pnx1300/01/02/11 data book philips semiconductors a-207 preliminary specification write program control and status word syntax [ if r guard ] writepcsw r src1 r src2 function if r guard then { pcsw (pcsw & ~r src2 ) | (r src1 & r src2 ) } attributes function unit fcomp operation code 161 number of operands 1 modifier no modifier range ? latency 1 issue slots 3 description the writepcsw copies the value of r src1 to the pcsw (program control and status word) processor register using rsrc2 as a mask. a bit in pcsw is affected by writepcsw only if the corresponding bi t in rsrc2 is set to 1; the value of any bit in pcsw with a correspondi ng 0-bit in rsrc2 will not be changed by writepcsw . whenever a hardware update (e.g., when a floating-point exception is raised) and a software update (through a writepcsw ) coincide, the pcsw bits currently be ing updated by hardware will reflect the hardwar e-determined value while the bits not being affected by hardware will reflect the value in the writepcsw operand. the layout of pcsw is shown below. the prog rammer should take care not to alter undef fields in the pcsw. fields in the pcsw have two chief purposes: to control aspects of processor operation and to record events that occur during program execution. thus, writepcsw can be used to effect changes in some aspects of processor operation and to clear fields that record events; this opera tion can also be used to restore state before resuming an idled task in a multi-tasking environment. note: the latency of writepcsw is 1, i.e. the pc sw reflects the new value in the next cycle. but it takes additional 3 cycles for update s to the exception flags and exception enable bits to take effect in the hardware. therefore 3 delay slots / nops sha ll be inserted between writepcsw and the next interruptible jump, if exception flags or enable bits are changed. this guarantees that the new state is recognized in the interrupt logic during execution of the ijump. the writepcsw operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of pcsw. if the lsb of r guard is 1, pcsw is written; ot herwise, pcsw is unchanged. examples initial values operation result r30 = 0x100, r40 = 0x180 writepcsw r30 r40 pcsw.ieee mode = to positive infinity r20 = 0, r50 = 0x0, r60 = 0x400 if r20 writepcsw r50 r60 no change, since guard is false r21 = 1, r50 = 0x0, r60 = 0x400 if r21 writepcsw r50 r60 pcsw.ien = 0 (disable interrupts) r70 = 0x80110000, r80 = 0xffff0000 writepcsw r70 r80 enable trap on mse, inv and dbz exclusively mse cs ien bsx ieee mode ofz ifz inv ovf unf inx dbz 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 misaligned store exception count stalls (1 ? yes) fp exception trap-enable bits ieee rounding mode 0 ? to nearest, 1 ? to zero, 2 ? to positive, 3 ? to negative interrupt enable (1 ? allow interrupts) byte sex (1 ? little endian) pcsw<31:16> pcsw<15:0> undef misaligned store exception trap enable trap on first exit fp exceptions trp mse tfe trp ofz trp ifz trp inv trp ovf trp unf trp inx trp dbz 16 17 18 19 20 21 22 23 25 26 27 28 30 31 undef undefined 13 wbe rse write back error reserved exception trp wbe trp rse write back error trap enable reserved exception trap enable 29 see also readpcsw fadd faddflags ijmpf cycles hicycles writepcsw
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-208 write source program counter syntax [ if r guard ] writespc r src1 function if r guard then spc r src1 attributes function unit fcomp operation code 159 number of operands 1 modifier no modifier range ? latency 1 issue slots 3 description the writespc copies the value of r src1 to the spc (source program counter) processor register. whenever a hardware update (during an interruptible jump) and a software update (through a writespc ) coincide, the software update takes precedence. an interruptible jump that is not interrupted (no nmi, in t, or exc event was pending when the jump was executed) writes its target address to spc. the value of spc is inte nded to allow an exception-handling routine to determine the start address of the block of schedul ed code (called a decision tree) that was executing before the exception was taken. the writespc operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of spc. if the lsb of r guard is 1, spc is written; ot herwise, spc is unchanged. examples initial values operation result r30 = 0xbeebee writespc r30 spc 0xbeebee r20 = 0, r31 = 0xabba if r20 writespc r31 no change, since guard is false r21 = 1, r31 = 0xabba if r21 writespc r31 spc 0xabba see also readspc writedpc ijmpf ijmpi ijmpt writespc
pnx1300/01/02/11 data book philips semiconductors a-209 preliminary specification zero extend 16 bits pseudo-op for pack16lsb syntax [ if r guard ] zex16 r src1 r dest function if r guard then r dest zero_ext16to32(r src1 <15:0>) attributes function unit alu operation code 53 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the zex16 operation is a pseudo operation tr ansformed by the scheduler into a pack16lsb with 0 as the first argument and r src1 as the second. (note: pseudo operations cannot be used in assembly source files.) as shown below, the zex16 operation zero extends the least-signif icant 16-bit halfword of the argument, r src1 , to 32 bits and writes the result in r dest . the zex16 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destination register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffff0040 zex16 r30 r60 r60 0x00000040 r10 = 0, r40 = 0xff0fff91 if r10 zex16 r40 r70 no change, since guard is false r20 = 1, r40 = 0xff0fff91 if r20 zex16 r40 r100 r100 0x0000ff91 r50 = 0x00000091 zex16 r50 r110 r110 0x00000091 0 15 31 r src1 0 31 r dest 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 unsigned unsigned see also sex16 sex8 zex8 zex16
philips semiconductors pnx1300/01/02/11 dspcpu operations preliminary speci fication a-210 zero extend 8 bits pseudo-op for ubytesel syntax [ if r guard ] zex8 r src1 r dest function if r guard then r dest zero_ext8to32(r src1 <7:0>) attributes function unit alu operation code 55 number of operands 1 modifier no modifier range ? latency 1 issue slots 1, 2, 3, 4, 5 description the zex8 operation is a pseudo operation tr ansformed by the scheduler into a ubytesel with r0 (always contains 0) as the first argument and r src1 as the second. (note: pseudo operations cannot be used in assembly source files.) as shown below, the zex8 operation zero extends the least- significant byte of the argument, r src1 , to 32 bits and writes the result in r dest . the zex8 operation optionally takes a guard, specified in r guard . if a guard is present, its lsb controls the modification of the destinatio n register. if the lsb of r guard is 1, r dest is written; otherwise, r dest is not changed. examples initial values operation result r30 = 0xffff0040 zex8 r30 r60 r60 0x00000040 r10 = 0, r40 = 0xff0fff91 if r10 zex8 r40 r70 no change, since guard is false r20 = 1, r40 = 0xff0fff91 if r20 zex8 r40 r100 r100 0x00000091 r50 = 0x00000091 zex8 r50 r110 r110 0x00000091 0 31 r src1 0 31 r dest 0 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 unsigned unsigned see also ubytesel sex16 sex8 zex16 zex8
pnx1300/01/02/11 data book philips semiconductors a-211 preliminary specification
pnx1300/01/02/11 data book philips semiconductors a-212 preliminary specification
preliminary specification b-1 mmio register summary chapter b by gert slavenburg, and selliah rathnam b.1 mmio registers the following table lists all the mmio registers implemente d in pnx1300/01/02/11. the registers are grouped accord- ing to the unit to which they belong. for compatibility with future devices, any undefined mmio bits should be ignored when read, and written as zeroes. mmio register name offset (in hex) accessibility description dspcpu external pci initiators dspcpu registers dram_base 10 0000 r/w r/w start of dram address aperture dram_limit 10 0004 r/w r/w end of dram address aperture mmio_base 10 0400 r/w r/w start of 2-mb mmio-register address aperture excvec 10 0800 r/w r/w interrupt vector (handl er start address) for exceptions isetting0 10 0810 r/w r/w interrupt mode & priority settings for sources 0-7 isetting1 10 0814 r/w r/w interrupt mode & priority settings for sources 8-15 isetting2 10 0818 r/w r/w interrupt mode & priority settings for sources 16-23 isetting3 10 081c r/w r/w interrupt mode & priority settings for sources 24-31 ipending 10 0820 r/w r/w interrupt-pending status bit for all 32 sources iclear 10 0824 r/w r/w interrupt-clear bit for all 32 sources imask 10 0828 r/w r/w interrupt-mask bit for all 32 sources intvec0 10 0880 r/w r/w interrupt vector ( handler start address) for source 0 intvec1 10 0884 r/w r/w interrupt vector ( handler start address) for source 1 intvec2 10 0888 r/w r/w interrupt vector ( handler start address) for source 2 intvec3 10 088c r/w r/w interrupt vector ( handler start address) for source 3 intvec4 10 0890 r/w r/w interrupt vector ( handler start address) for source 4 intvec5 10 0894 r/w r/w interrupt vector ( handler start address) for source 5 intvec6 10 0898 r/w r/w interrupt vector ( handler start address) for source 6 intvec7 10 089c r/w r/w interrupt vector ( handler start address) for source 7 intvec8 10 08a0 r/w r/w interrupt vector ( handler start address) for source 8 intvec9 10 08a4 r/w r/w interrupt vector ( handler start address) for source 9 intvec10 10 08a8 r/w r/w interrupt vector ( handler start address) for source 10 intvec11 10 08ac r/w r/w interrupt vector ( handler start address) for source 11 intvec12 10 08b0 r/w r/w interrupt vector ( handler start address) for source 12 intvec13 10 08b4 r/w r/w interrupt vector ( handler start address) for source 13 intvec14 10 08b8 r/w r/w interrupt vector ( handler start address) for source 14 intvec15 10 08bc r/w r/w interrupt vector ( handler start address) for source 15 intvec16 10 08c0 r/w r/w interrupt vector (handler start address) for source 16 intvec17 10 08c4 r/w r/w interrupt vector (handler start address) for source 17 intvec18 10 08c8 r/w r/w interrupt vector (handler start address) for source 18 intvec19 10 08cc r/w r/w interrupt vector (handler start address) for source 19
pnx1300/01/02/11 data book philips semiconductors b-2 preliminary specification intvec20 10 08d0 r/w r/w interrupt vector ( handler start address) for source 20 intvec21 10 08d4 r/w r/w interrupt vector ( handler start address) for source 21 intvec22 10 08d8 r/w r/w interrupt vector ( handler start address) for source 22 intvec23 10 08dc r/w r/w interrupt vector ( handler start address) for source 23 intvec24 10 08e0 r/w r/w interrupt vector ( handler start address) for source 24 intvec25 10 08e4 r/w r/w interrupt vector ( handler start address) for source 25 intvec26 10 08e8 r/w r/w interrupt vector ( handler start address) for source 26 intvec27 10 08ec r/w r/w interrupt vector ( handler start address) for source 27 intvec28 10 08f0 r/w r/w interrupt vector (handler start address) for source 28 intvec29 10 08f4 r/w r/w interrupt vector (handler start address) for source 29 intvec30 10 08f8 r/w r/w interrupt vector (handler start address) for source 30 intvec31 10 08fc r/w r/w interrupt vector (handler start address) for source 31 timer1_tmodulus 10 0c00 r/w r/w contains: (maximum count value for timer 1) + 1 timer1_tvalue 10 0c04 r/w r/w current value of timer 1 counter timer1_tctl 10 0c08 r/w r/w timer 1 control (prescale value, source select, run bit) timer2_tmodulus 10 0c20 r/w r/w contains: (maximum count value for timer 2) + 1 timer2_tvalue 10 0c24 r/w r/w current value of timer 2 counter timer2_tctl 10 0c28 r/w r/w timer 2 control (prescale value, source select, run bit) timer3_tmodulus 10 0c40 r/w r/w contains: (maximum count value for timer 3) + 1 timer3_tvalue 10 0c44 r/w r/w current value of timer 3 counter timer3_tctl 10 0c48 r/w r/w timer 3 control (prescale value, source select, run bit) systimer_tmodulus 10 0c60 r/w r/w contains: (max imum count value fo r system timer) + 1 systimer_tvalue 10 0c64 r/w r/w current value of system timer/counter systimer_tctl 10 0c68 r/w r/w system timer contro l (prescale value, source select, run bit) bictl 10 1000 r/w r/w instruction breakpoint control binstlow 10 1004 r/w r/w start of address range that causes inst ruction breakpoints binsthigh 10 1008 r/w r/w end of address range that causes instruction breakpoints bdctl 10 1020 r/w r/w data breakpoint control bdataalow 10 1030 r/w r/w start of addres s range that causes data breakpoints bdataahigh 10 1034 r/w r/w end of address range that causes data breakpoints bdataval 10 1038 r/w r/w compare value for data breakpoints bdatamask 10 103c r/w r/w compare mask for compare value for data breakpoints cache and memory system dram_cacheable_limit 10 0008 r/w r/w start of non-cacheable region in dram mem_events 10 000c r/w r/w selects two cache-related events for counting dc_lock_ctl 10 0010 r/w r/w enable bit for data-cache locking, also pci hole disable dc_lock_addr 10 0014 r/w r/w start of address range t hat will be locked into the data cache dc_lock_size 10 0018 r/w r/w size of address range that will be locked into the data cache dc_params 10 001c r/? r/? data-cache geometry (blocksize, associativity, # of sets) ic_params 10 0020 r/? r/? instruction-cache ge ometry (blocksize, assoc., # of sets) mm_config 10 0100 r/? r/? dram settings (rank size, bus width, refresh interval) arb_bw_ctl 10 0104 r/w r/w internal bus arbitrat ion control (bandwidth/latency allocation) arb_raise 10 010c r/w r/w arbiter priority raising timer power_down 10 0108 r/w r/w write to this register to initiate power down ic_lock_ctl 10 0210 r/w r/w enable bit for instruction-cache locking ic_lock_addr 10 0214 r/w r/w start of address range t hat will be locked in to the instruction cache mmio register name offset (in hex) accessibility description dspcpu external pci initiators
philips semiconductors mmio register summary preliminary specification b-3 ic_lock_size 10 0218 r/w r/w size of address range that will be locked into the instruction cache pll_ratios 10 0300 r/? r/? sets ratios of external and internal clock frequencies block_power_down 10 3428 r/w r/w powers up and down individual blocks video in vi_status 10 1400 r/? r/? st atus of video-in unit vi_ctl 10 1404 r/w r/w sets operation and interrupt modes for video in vi_clock 10 1408 r/w r/w sets clock s ource (internal/external), frequency vi_cap_start 10 140c r/w r/w sets capture start x and y offsets vi_cap_size 10 1410 r/w r/w sets capture size width and height vi_base1 vi_y_base_adr 10 1414 r/w r/w capture modes: sets base address of y-value array message/raw modes: sets base address of buffer 1 vi_base2 vi_u_base_adr 10 1418 r/w r/w capture modes: sets base address of u-value array message/raw modes: sets base address of buffer 2 vi_size vi_v_base_adr 10 141c r/w r/w capture modes: sets base address of v-value array message/raw modes: sets size of buffers vi_uv_delta 10 1420 r/w r/w capture modes: address delta for adjacent u, v lines vi_y_delta 10 1424 r/w r/w capture modes: address delta for adjacent y lines video out vo_status 10 1800 r/? r/? status of video-out unit vo_ctl 10 1804 r/w r/w sets operation and interrupt modes for video out vo_clock 10 1808 r/w r/w sets video-out clock frequency vo_frame 10 180c r/w r/w sets frame parameters (preset, start, length) vo_field 10 1810 r/w r/w sets field parameters (overlap, field-1 line, field-2 line) vo_line 10 1814 r/w r/w sets field paramete rs (starting pixel, frame width) vo_image 10 1818 r/w r/w sets image parameters (height, width) vo_ythr 10 181c r/w r/w sets threshold for ytr interrupt, image v/h offsets vo_olstart 10 1820 r/w r/w sets overlay image parameters (start line/pixel, alpha) vo_olhw 10 1824 r/w r/w sets overlay image parameters (height, width) vo_yadd 10 1828 r/w r/w sets y-component/buffer-1 starting address vo_uadd 10 182c r/w r/w sets u-component/buffer-2 starting address vo_vadd 10 1830 r/w r/w sets v-co mponent address/buffer-1 length vo_oladd 10 1834 r/w r/w sets overlay image address/buffer-2 length vo_vuf 10 1838 r/w r/w sets start-of-line- to-start-of-line address offsets (u, v) vo_yolf 10 183c r/w r/w sets start-of-line-to -start-of-line addr. offsets (y, overlay) evo_ctl 10 1840 r/w r/w sets operations for enhance video out evo_mask 10 1844 r/w r/w sets yuv mask values foe the chroma-key process evo_clip 10 1848 r/w r/w sets output clip values evo_key 10 184c r/w r/w sets yuv chroma-key values evo_slvdly 10 1850 r/w r/w sets delay cycles for genlock mode audio in ai_status 10 1c00 r/? r/? status of audio-in unit ai_ctl 10 1c04 r/w r/w sets operati on and interrupt modes for audio in ai_serial 10 1c08 r/w r/w sets clock rati os and internal/external clock generation ai_framing 10 1c0c r/w r/w sets format of serial data stream mmio register name offset (in hex) accessibility description dspcpu external pci initiators
pnx1300/01/02/11 data book philips semiconductors b-4 preliminary specification ai_freq 10 1c10 r/w r/w sets ai_osclk frequency ai_base1 10 1c14 r/w r/w sets base address of buffer 1 ai_base2 10 1c18 r/w r/w sets base address of buffer 2 ai_size 10 1c1c r/w r/w sets number of samples in buffers audio out ao_status 10 2000 r/? r/? status of audio-out unit ao_ctl 10 2004 r/w r/w sets operation and interrupt modes for audio out ao_serial 10 2008 r/w r/w sets clock ratios and internal/external clock generation ao_framing 10 200c r/w r/w sets format of serial data stream ao_freq 10 2010 r/w r/w set ao_osclk frequency ao_base1 10 2014 r/w r/w sets base address of buffer 1 ao_base2 10 2018 r/w r/w sets base address of buffer 2 ao_size 10 201c r/w r/w sets number of samples in buffers ao_cc 10 2020 r/w r/w codec control field values ao_cfc 10 2024 r/w r/w codec frame control ao_tstamp 10 2028 r/? r/w timestamp of the last buffer spdif out sdo_status 10 4c00 r/? r/? status register sdo_ctl 10 4c04 r/w r/w control register sdo_freq 10 4c08 r/w r/w frequency register sdo_base1 10 4c0c r/w r/w base address of buffer 1 sdo_base2 10 4c10 r/w r/w base address of buffer 2 sdo_size 10 4c14 r/w r/w number of samples in buffers sdo_tstamp 10 4c18 r/? r/? timestamp of the last buffer pci interface biu_status 10 3004 r/? r/? status of pci interface (done/busy bits, error bits) biu_ctl 10 3008 r/w r/w sets operation and interrupt modes for pci pci_adr 10 300c r/w ?/? holds address for dspcpu pci access pci_data 10 3010 r/w ?/? holds data for dspcpu pci access config_adr 10 3014 r/w r/w holds address for configuration access config_data 10 3018 r/w r/w holds data for configuration access config_ctl 10 301c r/w r/w sets read/write, bus number for configuration access io_adr 10 3020 r/w r/w holds address for i/o access io_data 10 3024 r/w r/w holds data for i/o access io_ctl 10 3028 r/w r/w sets read/writ e, byte-enable for i/o access src_adr 10 302c r/w r/w holds source address for dma operation dest_adr 10 3030 r/w r/w holds destination address for dma operation dma_ctl 10 3034 r/w r/w sets read/write, transfer length for dma operation int_ctl 10 3038 r/w r/w controls interrupt system xio_ctl 10 3060 r/w r/w xio control register jtag jtag_data_in 10 3800 r/w r/w jtag data input buffer jtag_data_out 10 3804 r/w r/w jtag data output buffer jtag_ctl 10 3808 r/w r/w jtag control image co-processor mmio register name offset (in hex) accessibility description dspcpu external pci initiators
philips semiconductors mmio register summary preliminary specification b-5 icp_mpc 10 2400 r/w r/w microprogram counter icp_mir 10 2404 r/w r/w micr o instruction register icp_dp 10 2408 r/w r/w data pointer icp_dr 10 2410 r/w r/w data register icp_sr 10 2414 r/w r/w status register vld co-processor vld_command 10 2800 r/w r/w next action to be taken by vld vld_sr 10 2804 r/? r/? bitstream shift register vld_qs 10 2808 r/w r/w quantization scale code vld_pi 10 280c r/w r/w picture layer information vld_status 10 2810 r/w r/w status register vld_imask 10 2814 r/w r/w controls which status bits causes vld interrupts vld_ctl 10 2818 r/w r/w control register vld_bit_adr 10 281c r/w r/w current bitstream read address vld_bit_cnt 10 2820 r/w r/w bitstream remaining byte count vld_mbh_adr 10 2824 r/w r/w macro block header output address vld_mbh_cnt 10 2828 r/w r/w macro block header output remaining count vld_rl_adr 10 282c r/w r/w run/length output address vld_rl_cnt 10 2830 r/w r/w run/length output remaining count i 2 c interface iic_ar 10 3400 r/w r/w address, byte count and direction iic_dr 10 3404 r/w r/w data register iic_status 10 3408 r/? r/? status register iic_ctl 10 340c r/w r/w control register synchronous serial interface ssi_ctl 10 2c00 r/w r/w control register ssi_csr 10 2c04 r/w r/w additional control and status register ssi_txdr 10 2c10 ?/w ?/w transmit data register ssi_rxdr 10 2c20 r/? r/? receive data register ssi_rxack 10 2c24 ?/w ?/w write a ?1? here to ack read of receive data register sem device sem 10 0500 r/w r/w simple multi-processor semaphore mmio register name offset (in hex) accessibility description dspcpu external pci initiators
pnx1300/01/02/11 data book philips semiconductors b-6 preliminary specification
preliminary specification c-1 endian-ness appendix c by selliah rathnam, luis lucas c.1 purpose in this document, the generic pnx1300 name refers to the pnx1300 series, or the pnx1300/01/02/11 products. pnx1300 was designed to support both little and big endian systems. the pci system bus (controlled by the pci interface unit (biu)) operates in little endian mode in both systems. this docu ment describes how the dual endian-ness feature is handled in pnx1300. c.2 little and big endian addressing conventions in big endian mode, a given word address (32-bit) base corresponds to the most significant byte (msb) of the word. increasing the byte address generally means de- creasing the significance of the byte being accessed. in little endian mode, the same word address base refers to the least significant byte (lsb) of that word. increasing the byte address generally means increasing the signifi- cance of the byte being acce ssed. this addressing con- vention is shown in figure c-1 . in figure c-1 , there is a two-line ?c? code which defines a 32-bit constant in hex format assigned to the variable ?w? (assumes ?int? is 32-bit) and its address is copied into the byte (character) pointer variable ?cp?. the value of ad- dress referenced by the ?cp? has a value of ?0x04? in big endian machine and a value of ?0x07? in little endian ma- chine. it is possible to transfer from one endian-ness to another just by swapping the bytes within a word as shown in fig- ure c-2 . int w = 0x04050607; char *cp = (char *)&w; figure c-1. big and little endian address references 0 31 04 05 06 07 big endian mode little endian mode cp+0 04 05 06 07 cp+3 cp+1 cp+2 cp+3 cp+2 cp+1 cp+0 0 31 figure c-2. data conversion from bi g endian to little endian (bsw) int w = 0x04050607; char *cp = (char *)&w; 0 31 07 06 05 04 big endian mode little endian mode cp+0 04 05 06 07 cp+3 cp+1 cp+2 cp+3 cp+2 cp+1 cp+0 0 31
pnx1300/01/02/11 data book philips semiconductors c-2 preliminary specification c.3 test to verify the correct operation of pnx1300 in big and little endian systems the following test can be used to verify the correct oper- ation of pnx1300 in little endian and big endian sys- tems. 1. store a 32-bit constant ?0x04050607? from the host cpu to the pnx1300 sdram through the pci inter- face. load the word from the same address to one of the pnx1300?s global regist er and check for the same value. 2. store a 32-bit constant ?0x04050607? from the host cpu to the pnx1300 sdram through pci interface. load a byte from the same address to one of the pnx1300 global registers. check for the value of ?0x04? in big endian systems, and check for the value ?0x07? in little endian systems. c.4 requirement for the pnx1300 to operate in either little endian or big endian mode the endian-ness handling in each pnx1300 unit is de- scribed in the following sections. most units use the high- way/pci bus to transfer data. the highway/pci bus has four byte lanes. the bit assignment of the highway/pci bus lanes is shown in table c-2 . the pci bus and pnx1300 highway buses are address- invariant buses, i.e the data corresponding to address offset ?0? uses the byte-0 lane of the highway/pci bus, the data corresponds to address offset ?1? uses the byte- 1 lane of the highway/pci bus etc. c.4.1 data cache the pnx1300 pcsw register has a byte-sex (bsx) bit to configure the pnx1300 in big endian or little endian mode. this bit must be set to ?1? for the little endian mode as defined in chapter 3, ?dspcpu architecture.? this bsx bit is used by th e pnx1300 data cache unit for the store/load operation. data cache performs three cat- egories of data transactions: ? read/write data from/to dspcpu registers to/from data cache or sdram ? read/write of mmio data from/to dspcpu registers to/from mmio registers ? read/write data from/to dspcpu registers to/from pci address space through special registers in the biu unit. the dspcpu endian-ness is determined by the value of the bsx bit in the pcsw register. table c-1 and table c-3 describe the data translation format being used by the data cache to transfer the data to/from dspcpu reg- ister to/from data cache or sdram. table c-1 and table c-3 are restricted to addresses that fall in the dram_base and dram_limit range. there is no byte-swap required for the mmio data trans- action from/to dspcpu register to the mmio registers. however, one of the special registers, pci_data, does not follow the normal mmio transactions. the data cache byte-swaps the data to/from the pci_data regis- ter using the data translation format as defined in table c-1 and table c-3 for the memory cycle. for the pci configuration cycle and i/o cycle transac- tions from the dspcpu, a programmer can byte-swap the data in the dspcpu registers and write to the pci_data register using mmio write operations. there is no byte-swap from the pci_data register in biu unit to the pci bus. software uses the table c-1 or table c- 3 data to byte-swap the data within the cpu register be- fore writing the data to the pci_data register for the configuration and i/o cycle transactions. table c-1. little endian data format in pnx1300 ds pcpu register, highway, sdram memory, pci bus, host memory, host cpu register pcsw- bsx value endian mode data transaction type address data in dspcpu register msb lsb data in highway/ dcache/sdram/ pci-bus byte3 byte0 [31:24] [7:0] data in host cpu register msb lsb data in host memory byte3 byte0 [31:24] [7:0] 1 little word r/w 00001000 01020304 01020304 01020304 01020304 1 little half-word r/w 00001000 xxxx0304 xxxx0304 xxxx0304 xxxx0304 1 little half-word r/w 00001002 xxxx0304 0304xxxx xxxx0304 0304xxxx 1 little byte read/write 00001000 xxxxxx04 xxxxxx04 xxxxxx04 xxxxxx04 1 little byte read/write 00001001 xxxxxx04 xxxx04xx xxxxxx04 xxxx04xx 1 little byte read/write 00001002 xxxxxx04 xx04xxxx xxxxxx04 xx04xxxx 1 little byte read/write 00001003 xxxxxx04 04xxxxxx xxxxxx04 04xxxxxx table c-2. bit assignment of the highway/pci bus lanes byte 3 byte 2 byte 1 byte 0 bits 31:24 23:16 15:8 7:0
philips semiconductors endian-ness preliminary specification c-3 c.4.2 instruction cache it is assumed that the inst ruction cache always operates in little endian regardless of the host and pnx1300 en- dian-ness. instruction cache does not use the pcsw?s byte sex bit (bsx). the compiler supports the loading of instructions in memory differently for big endian and lit- tle endian modes. c.4.3 pnx1300 pci interface unit the pnx1300 highway bus and the pci bus are address invariant buses, i.e. a dat a corresponding to address zero is always transferred through the byte-zero line re- gardless of the endian-ness. the address-invariant na- ture of the pci and the highway buses allows data to be transferred from/to pci bus di rectly to/from sdram with- out byte swapping in either bi g or little endian mode the byte swapping of data for big endian mode is performed by the data cache unit. however, mmio data does not go through the byte swapper in the data cache. this results in using a byte-swapper in the biu to byte-swap the mmio data in big endian mode. the pnx1300 biu has a separate byte sex (se, swap enabled) flag defined in its control register (biu_ctl). this byte-sex flag must be set by the software, i.e. mmio write operation from the host cpu. this byte-sex flag is used only for mmio data accesses and none of the mmio data accesses is affected by this se flag. table c- 4 shows the byte-swap logic that handles the mmio ac- cesses from the dspcpu and host cpu and the non mmio data accesses from any source. the biu has several special registers to handle memory, pci configuration, i/o and dma accesses. it does not byte-swap the i/o data from the special registers. the data cache and software performs the necessary byte swapping for this data. when using pnx1300 in little endian-based systems, the first transaction to the pnx1300 is to set the se bit in the biu configuration register to avoid unnecessary soft- ware byte-swapping in the host cpu for the subsequent mmio read/write accesses. the se bit in the biu_ctl register controls the byte sw apping of outgoing and in- coming data from pci bus. the default value of se is ?0?, i.e the biu byte-swaps the mmio data including the write operation to the biu_ctl register. software is required to byte swap the biu_ctl register value within the host cpu before storing the value in biu_ctl register. once, the biu.se bit has been set, no additional software byte- swapping is required for further read/write operations to any mmio registers. c.4.4 image coprocessor (icp) the input source data for the icp unit might come from different units such as video in, the dspcpu, pci bus, etc. via sdram. data cons istency needs to be main- tained when the pnx1300 operates in little or big endi- an systems/mode. the icp nee ds the capability to oper- ate on the sdram as sour ce data and sdram or pci as destination data in either little or big endian mode. figure c-3 , figure c-4 , figure c-5 and figure c-6 illus- trate the big and little endian memory image format for the image input format ( figure c-3 ) and the three sup- ported image overlay formats. the icp can output the data to either the sdram or pci bus. rgb 8r and rgb 8a pixel formats are byte streams and therefore do not require any byte swapping. figure c-9 pictures the data format. rgb-24 + , rgb-15 + , rgb-16 and yuv-4:2:2 pixel formats can be used to out- put the pixels to pci or sdram in both endian modes. output formats are shown, respectively, in figure c-4 , figure c-5 , figure c-8 , and figure c-7 . packed rgb-24 cannot be used in big endian mode. little endian data format is shown in figure c-11 . table c-3. big endian data format in the pnx1300 d spcpu register, highway, sd ram memory, pci bus, host memory, and host cpu register pcsw- bsx value endian mode data transaction type address data in dspcpu register msb lsb data in highway/ dcache/sdram/ pci-bus byte3 byte0 [31:24] [7:0] data in host cpu register msb lsb data in host memory byte0 byte3 [31:24] [7:0] 0 big word r/w 00001000 01020304 04030201 01020304 01020304 0 big half-word r/w 00001000 xxxx0304 xxxx0403 xxxx0304 0304xxxx 0 big half-word r/w 00001002 xxxx0304 0403xxxx xxxx0304 xxxx0304 0 big byte read/write 00001000 xxxxxx04 xxxxxx04 xxxxxx04 04xxxxxx 0 big byte read/write 00001001 xxxxxx04 xxxx04xx xxxxxx04 xx04xxxx 0 big byte read/write 00001002 xxxxxx04 xx04xxxx xxxxxx04 xxxx04xx 0 big byte read/write 00001003 xxxxxx04 04xxxxxx xxxxxx04 xxxxxx04 table c-4. biu.se bit usage in processing data in biu unit biu.se value endian mode mmio access from dspcpu mmio access from pci side non mmio data 0 big no byte-swap byte-swap no byte- swap 1 little no byte-swap no byte-swap no byte- swap
pnx1300/01/02/11 data book philips semiconductors c-4 preliminary specification note: a+0 corresponds to byte-0 lane of sdram/hwy and a+3 corresponds to by te-3 lane of sdram/hwy figure c-3. byte mask, planar yuv 4: 2:0 and yuv 4:2:2 for icp, vo or vi memory data in little and big en- dian modes y pixel byte data y7 y6 y5 y4 y3 y2 y1 y0 big endian mode little endian mode in memory a+3 (same for u, v, b) y3 y2 y1 y0 y7 y6 y5 y4 a+3 a+2 a+1 a+0 a+2 a+1 a+0 31 31 0 0 figure c-4. rbg-24+ data format for icp in little and big endian modes 0 r0 g0 b0 pixel word data 1 r1 g1 b1 1 r1 g1 b1 0 r0 g0 b0 big endian mode little endian mode in memory or pci note: a+0 corresponds to by te-0 lane of sdram/hwy/pci and a+3 corresponds to byte-3 lane of sdram/hwy/pci a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 31 31 0 0 figure c-5. rbg-15+ data format for icp in little and big endian modes pixel half-word data in memory or pci a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 r0g?0 g0b0 r1g?1 g1b1 r2g?2 g2b2 r3g?3 g3b3 r0g?0 g0b0 r1g?1 g1b1 r2g?2 g2b2 r3g?3 g3b3 big endian mode little endian mode p n+1 p n+1 p n p n 31 31 0 0 note: a+0 corresponds to by te-0 lane of sdram/hwy/pci and a+3 corresponds to byte-3 lane of sdram/hwy/pci
philips semiconductors endian-ness preliminary specification c-5 figure c-6. packed yuv 4:2:2+ data format for the icp or vo in little and big endian modes pixel half-word data big endian mode little endian mode in memory or pci a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 p n+1 p n+1 p n p n u0 0 y0 v0 1 y1 u1 2 y2 v1 3 y3 u0 0 y0 v0 1 y1 u1 2 y2 v1 3 y3 31 31 0 0 note: a+0 corresponds to by te-0 lane of sdram/hwy/pci and a+3 corresponds to byte -3 lane of sdram/hwy/pci figure c-7. packed yuv 4:2:2 data format for icp in little and big endian modes pixel half-word data big endian mode little endian mode in memory or pci a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 p n+1 p n+1 p n p n u0 y0 v0 y1 u1 y2 v1 y3 u0 y0 v0 y1 u1 y2 v1 y3 31 31 0 0 note: a+0 corresponds to by te-0 lane of sdram/hwy/pci and a+3 corresponds to byte -3 lane of sdram/hwy/pci figure c-8. rbg-16 data format for icp in litt le and big endian modes pixel half-word data in memory or pci a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 r0g?0 g0b0 r1g?1 g1b1 r2g?2 g2b2 r3g?3 g3b3 r0g?0 g0b0 r1g?1 g1b1 r2g?2 g2b2 r3g?3 g3b3 big endian mode little endian mode p n+1 p n+1 p n p n 31 31 0 0 note: a+0 corresponds to by te-0 lane of sdram/hwy/pci and a+3 corresponds to byte-3 lane of sdram/hwy/pci
pnx1300/01/02/11 data book philips semiconductors c-6 preliminary specification figure c-9. rgb8a and rgb8r data format for icp in little and big endian modes rgb 8a or 8r p7 p6 p5 p4 p3 p2 p1 p0 big endian mode little endian mode in memory or pci a+3 (same for u, v, b) p3 p2 p1 p0 p7 p6 p5 p4 a+3 a+2 a+1 a+0 a+2 a+1 a+0 31 31 0 0 note: a+0 corresponds to byte-zero lane of sdram/hwy/pci and a+3 corresponds to byte-three lane of sdram/hwy/pci figure c-10. half-word swap within a half-word (bsh) 0 31 05 04 07 06 before swap after swap 04 05 06 07 0 31 figure c-11. packed rbg-24 data format for icp in little endian mode only pixel word data b1 r0 g0 b0 big endian mode little endian mode in memory or pci note: a+0 corresponds to byte -zero lane of sdram/hwy/pci and a+3 corresponds to byte-three lane of sdram/hwy/pci a+3 a+2 a+1 a+0 31 0 r2 g2 b2 not supported g1 r1 r3 g3 b3
philips semiconductors endian-ness preliminary specification c-7 the table c-5 shows the byte-swap implementation of various pixel formats used in the icp unit. refer to figure c-2 and figure c-10 for the byte-swap code used in ta- ble c-4 and table c-5 . byte-swapping is performed only in big endian mode. no swapping is done in the little en- dian mode. the icp has a byte sex bit (l) defined in its mmio-based configuration register. the setting of this bit and the bsx bit in the pcsw register should be the same. the l bit must be set by the software. c.4.5 video in (vi) and video out (vo) units the vi unit stores the yuv pixe ls in planar 4:2:2 or 4:2:0 image format as shown in figure c-3 and stores the raw 8- and 10-bit data as shown in figure c-12 . the vo unit uses yuv-4:2: 2 planar, yuv-4:2:0 planar, and yuv-4:2:2+ packed as input pixel formats. the pla- nar memory image format of the yuv-4:2:2 and yuv- 4:2:0 are shown in figure c-3 . the yuv-4:2:2+ memo- ry image format for overlay is pictured in figure c-6 . the vi and vo units have a byte-sex bit (little endian and ltl_end) defined in t he control mmio registers, vi_control and vo_control. the definition of these byte-sex bits and the bsx bit in the pcsw register should be treated as same. little endian and ltl_end bits must be set by software. c.4.6 audio in (ai), audio-out (ao), and spdif out (sdo) units the ai unit uses 8-bit mono, 8-bit stereo, 16-bit mono and 16-bit stereo data. the ao unit uses 16-bit mono, 16-bit stereo, 32-bit mono and 32-bit stereo data. the spdo unit uses 32-bit word data. the memory image format of these data is presented in figure c-13 . swapping takes place at the byte level and the bits within a byte are never disturbed. both the ai and ao units have a byte sex bit (little_endian) defined in each units mmio-based configuratio n register. the definition of the these bits and the bsx bit in the pcsw register should be treated as same. this byte sex bit must be set by the software. c.4.7 variable length encoder (vld) unit the vld inputs data from sdram in the form of a bit- stream with a byte-aligned starting address and outputs a header stream and a ?run-level? data stream. the vld unit has a byte sex bit (little_endian) defined in its mmio-based configuration regi ster. the definition of this table c-5. icp byte swapping type for input data endian-ness l bit pixel type swap type (see figure c-2 & figure c-10 ) big endian 0 y,u,v planar no swap big endian 0 rgb 24+ bsw big endian 0 yuv-4:2:2+ bsh big endian 0 rgb 15+ bsh table c-6. icp byte swapping type for output data endian- ness l bit pixel type swap type (see figure c-2 & figure c-10 ) big endian 0 rgb 8a: 233 no swap big endian 0 rgb 8r: 332 no swap big endian 0 rgb 15+ bsh big endian 0 rgb 16 bsh big endian 0 rgb 24+ bsw big endian 0 rgb24 packed no support for big endian big endian 0 yuv- 4:2:2 packed bsh figure c-12. memory im age format for raw 8-bit and 10-bit data d n+3 d n+2 d n+1 d n big endian mode little endian mode a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 raw 8-bit data in memory d n+3 d n+2 d n+1 d n a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 raw 10-bit data in memory d n+1 d n lsb msb msb lsb d n+1 d n lsb msb msb lsb note: a+0 corresponds to byte-0 lane of sdram/hwy and a+3 corresponds to by te-3 lane of sdram/hwy lsb is the least significant byte msb is the most significant byte
pnx1300/01/02/11 data book philips semiconductors c-8 preliminary specification bit and the bsx bit in the pc sw register should the same. this byte sex bit mu st be set by the software. figure c-14 describes the vld input and output data for- mat as seen in the sdram and highway bus. the input data is byte oriented and no swapping is required in the vld unit. however, the output data is read by the dspcpu in words, thus the vld needs to swap the out- put bytes within a word (shown in figure c-14 ) to com- pensate for the cpu swap. c.4.8 synchronous serial interface (ssi) the ssi unit has i/o connec tions through the external serial pins and also to the internal 32-bit data highway via mmio transactions. the minimum quantity of data to be analyzed by the cpu is 16-bits (i.e. one half word). the ssi uses a 16-bit or 1-bit endian-ness; it is detailed in section 17.8 on page 17-7 . the 32-bit quantity contained in the cpu register is writt en or read ?as is? into/from the ssi mmio register. the ems bit in ssi_ctl determines which half-word (16-bit) is sent first as pictured in figure c-15 . figure c-13. memory im age format for audio data l n+3 l n+2 l n+1 l n big endian mode little endian mode a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 8-bit data (mono) in memory l n+3 l n+2 l n+1 l n a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 16-bit data (mono) in memory l n+1 l n lsb msb msb lsb l n+1 l n lsb msb msb lsb note: a+0 corresponds to byte-zero lane of sdram/hwy and a+3 corresponds to by te-three lane of sdram/hwy lsb is the least significant byte msb is the most significant byte r n+1 l n+1 r n l n a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 8-bit data (stereo) in memory r n+1 l n+1 r n l n a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 16-bit data (stereo) in memory r n l n lsb msb msb lsb r n l n lsb msb msb lsb a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 32-bit data in memory msb lsb lsb msb figure c-15. ssi data format as seen in highway a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 16-bit half-word data in cpu/mmios d n+1 d n d n+1 d n lsb msb msb lsb note: a+0 corresponds to byte-0 lane of cpu/hwy and a+3 corresponds to byte-3 lane of cpu/hwy lsb is the least significant byte msb is the most significant byte ssi_ctl.ems = 0 ssi_ctl.ems = 1 lsb msb msb lsb
philips semiconductors endian-ness preliminary specification c-9 c.4.9 compiler the tcs compiler supports t he loading of instruction in memory differently for big endian and little endian modes. c.5 summary pnx1300 is required to operate in the same endian-ness as the host cpu. at reset, pnx1300 operates in big en- dian mode; no special steps are required to set the endi- an bits. when using pnx1300 in little endian systems, the first transaction is to set the se bit in the biu_ctl register as described in the second paragraph of section 11.6.5 on page 11-11 . c.6 references 1. pci multimedia design guide , revision 1.0 - dated march 29,1994 2. designing pci cards and drivers for power macin- tosh computers , by apple computer, inc.; refer- ence: r0650ll/a; phone: 1-800-282-2732 figure c-14. vld input and output data format byte n+3 byte n+2 byte n+1 byte n big endian mode little endian mode a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 input data byte n+3 byte n+2 byte n+1 byte n 12 34 56 78 a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 header output header = 0x12345678 note: a+0 corresponds to byte-0 lane of sdram/hwy and a+3 corresponds to byte-1 lane of sdram/hwy 12 34 56 78 12 34 56 78 a+3 a+3 a+2 a+1 a+0 a+2 a+1 a+0 run level output run value = 0x1234 level value = 0x5678 12 34 56 78 at word address a
pnx1300/01/02/11 data book philips semiconductors c-10 preliminary specification
abc h e dfg ijklmnopqrstuvwxyz preliminary specification index-1 index numerics 12nc 1-10 a a/d converter 8-1 absolute maximum ratings 1-12 ac characteristics 1-12 address fields,instruction cache 5-8 address lines driving capacity 12-7 address mapping based on rank size 12-5 , 12-6 dram memory system 12-5 instruction cache 5-8 picture 5-9 addressing modes 3-4 ai_base1 picture 8-5 ai_base2 picture 8-5 ai_control field description table 8-6 ai_ctl picture 8-5 ai_framing picture 8-5 ai_freq picture 8-5 ai_osclk description table 8-1 ai_sck description table 8-1 ai_sd description table 8-1 ai_serial picture 8-5 ai_size picture 8-5 ai_status field description table 8-6 picture 8-5 ai_ws description table 8-1 algorithms image processing 14-6 of enhanced video out unit 7-10 algorithms, icp 14-6 alignment 5-4 alloc a-4 allocate on write 5-4 allocd a-5 allocr a-6 allocx a-7 alpha blending codes 14-5 byte for alpha blending 14-5 keying 14-9 registers 14-5 alpha blending 7-13 , 14-1 , 14-9 alpha blending codes 14-5 table 14-5 alpha value for overlay pixel 14-9 ao_base1 picture 9-8 ao_base2 picture 9-8 ao_cc picture 9-8 ao_cfc picture 9-8 ao_control field description table 9-9 , 9-10 ao_ctl picture 9-8 ao_framing picture 9-8 ao_freq picture 9-8 ao_osclk description table 9-2 ao_sck description table 9-2 ao_serial picture 9-8 ao_size picture 9-8 ao_status field description table 9-9 picture 9-8 , 16-2 aperture dram 5-2 memory 12-1 pci 11-2 aperture,pci 5-5 aperture_control field 5-5 asi a-8
abc h e dfg ijklmnopqrstuvwxyz index-2 prelimin ary specification asli a-9 asr a-10 asri a-11 audio capture 8-5 audio codec 8-1 , 8-3 audio in unit diagnostic mode 8-7 memory data formats 8-4 audio input 8-1 audio memory format 8-4 audio out unit memory data formats 9-7 audio sample rate 8-2 audio test 8-7 b bandwidth requirements of icp 14-1 base address pci interface registers 11-7 bdataahigh picture 3-14 bdataalow picture 3-14 bdatamask picture 3-14 bdataval picture 3-14 bdctl picture 3-14 bictl picture 3-14 binary compatibility 3-4 binsthigh picture 3-14 binstlow picture 3-14 bit masking 14-28 bitand a-12 bitandinv a-13 bitinv a-14 bitmap masking 14-1 bitor a-15 bitxor a-16 biu_ctl pci interface mmio register 11-11 picture 11-10 biu_status pci interface mmio register 11-11 picture 11-10 blending alpha 14-1 blending codes alpha blending 14-5 block timing pci output 14-16 boolean representation 3-3 borrow a-17 boundary scan 1-1 breakpoints 3-13 built-in self test pci interface register 11-7 byte ordering dspcpu 3-2 bytesex 3-2 c cache address mapping,instruction cache 5-8 alignment 5-3 , 5-4 associativity 5-3 bandwidth requirements 5-1 block size 5-3 blocksize 5-3 byte in word 5-3 coherency 5-3 , 5-4 , 5-11 copyback 5-4 copyback operation 5-6 cpu stall 5-8 data cache characteristics,table 5-3 data cache initialization 5-8 data cache,description 5-3 dcb opcode 5-6 dinvalid opcode 5-6 dirty bit 5-4 dirty bits 5-3 dual port 5-4 endian-ness 5-3 , 5-4 hidden concurrency 5-7 iclr operation 5-9 initialization 5-8 instruction cache 5-8 instruction cache coherency 5-9 instruction cache in itialization and boot 5-10 instruction cache parameters 5-8 instruction cache summary 5-8 instruction cache tag 5-8 invalidate operation 5-6 latency 5-8 locking 5-3 , 5-4 locking registers 5-5 lru replacement 5-11 memory hole 5-5
abc h e dfg ijklmnopqrstuvwxyz preliminary specification index-3 miss processing order 5-4 , 5-9 miss transfer order 5-3 mmio registers summary 5-13 noncachable region 5-3 non-cacheable region 5-5 number of sets 5-3 operation ordering 5-7 overview 5-1 overview,memory system 5-1 parameters 5-3 partial word transfers 5-4 partial words 5-3 performance evaluation support 5-12 performance events table 5-13 ports 5-3 rdstatus result format 5-6 rdtag result format 5-6 replacement policies 5-3 , 5-4 replacement policy 5-9 scheduling constraint 5-4 set 5-3 size 5-3 special data cache operations 5-6 special opcodes 5-4 special operation ordering 5-7 status operations 5-6 , 5-7 summary of characteristics 5-2 tag field of address 5-3 tag operations 5-6 , 5-7 valid bits 5-3 word in set 5-3 write misses 5-4 cache line size pci interface register 11-7 carry a-18 cccount definition 3-3 ccir 656 line timing description 7-4 pixel timing description 7-4 video connector on enhanced video out unit,picture 7-2 ccir 656 frame timing description 7-6 description table 7-6 ccir 656 line timing picture 7-5 ccir 656 pixel timing picture 7-5 ccir656 serial d1 7-2 chroma keying 14-1 chroma keying 7-14 chroma keying 14-1 , 14-9 circuit board design guidelines 12-7 class code pci interface register 11-6 clipping 7-14 codec 8-1 coherency 5-4 coherency,instruction cache 5-9 command id pci interface register 11-3 compatibility software 3-4 concurrency pci interface 11-3 concurrency,hidden 5-7 config_adr pci interface mmio register 11-12 picture 11-10 config_ctl pci interface mmio register 11-13 picture 11-10 config_data pci interface mmio register 11-13 configuration header 11-3 configuration operations pci interface 11-2 control word icp vertical filter 14-25 of icp 14-23 conversion interspersed to co-sited 7-11 to rgb 14-1 to yuv composite 14-1 yuv to rgb 14-3 , 14-9 copyback 5-4 co-sited sampling 6-4 counter 3-12 cpu stall 5-8 curcycles a-19 cycles a-20 d d1 serial 7-2 data address fields 5-3 data breakpoint 3-13 data cache coherency 5-11 dcb operation 5-6
abc h e dfg ijklmnopqrstuvwxyz index-4 prelimin ary specification dinvalid operation 5-6 initialization 5-8 lru replacement 5-11 performance evaluation support 5-12 rdstatus operation 5-6 rdtag operation 5-6 data cache locking registers 5-5 data format planar 14-3 dc/ac characteristics 1-12 dc_lock_addr description table 5-13 register 5-5 dc_lock_ctl description table 5-13 register 5-5 dc_lock_size description table 5-13 register 5-5 dc_params description table 5-13 fields 5-3 picture 5-3 dc_params register 5-3 dcb 5-6 , a-21 dcb operation 5-6 dds 7-3 , 8-2 debug frontend 18-3 debug support 3-13 dest_adr pci interface mmio register 11-14 picture 11-10 device control 3-7 device id pci interface register 11-3 device interrupts 3-11 diagnostic mode 8-7 audio in unit 8-7 dimensions 1-10 dinvalid 5-6 , a-22 dinvalid operation 5-6 direct digital synthesizer 7-3 , 8-2 dirty bit 5-4 dithering 14-10 algorithm 14-10 method 14-10 dma operations pci interface 11-2 dma_ctl pci interface mmio register 11-14 picture 11-10 downscaling 14-1 dpc definition 3-3 dram aperture 5-2 dram base 5-2 dram limit 5-2 dram memory system address aperture 12-1 address mapping 12-5 circuit board design 12-7 example block diagrams 12-9 example configurations table 12-3 features 12-1 granularity and sizes 12-2 initialization 12-6 mode register setting 12-6 on-chip interleaving 12-6 output driver capacity 12-7 power down mode 12-7 programming 12-3 refresh 12-6 signal pins 12-5 supported devices 12-2 supported rank configurations 12-2 dram_base description table 5-13 pci interface mmio register 11-9 pci interface register 11-7 picture 5-2 , 11-10 dram_base updates 11-10 dram_cacheable_limit description table 5-13 picture 5-5 dram_limit description table 5-13 picture 5-2 dspcpu addressing modes 3-4 byte ordering 3-2 register model 3-1 software compatibility 3-4 dspcpu operations listed alphabetically a-1 listed by function a-2 dspiabs a-23 dspiadd a-24 dspidualabs a-25 dspidualadd a-26 dspidualmul a-27 dspidualsub a-28 dspimul a-29 dspisub a-30 dspuadd a-31 dspumul a-32 dspuquadaddui a-33
abc h e dfg ijklmnopqrstuvwxyz preliminary specification index-5 dspusub a-34 dual port 5-4 e eav and sav codes description 7-5 eav format 6-5 edge sensitive interrupts 3-10 endian-ness 5-4 endianness 3-2 enhanced video out 7-1 enhanced video out unit active video definition picture 7-7 algorithms,overview 7-10 alpha blending 7-13 block diagram 7-3 ccir 656 frame timing description 7-6 description table 7-6 ccir 656 line timing description 7-4 picture 7-5 ccir 656 pixel timing description 7-4 picture 7-5 clock system 7-25 picture 7-3 connection to vid eo encoder,picture 7-2 connection to video in unit,picture 7-3 connection,ccir656,picture 7-2 data streaming 7-23 data transfer timing 7-9 dds 7-25 dds and pll setting,examples 7-25 error conditions 7-23 field definition picture 7-7 frame definition picture 7-7 frame timing signals 7-7 functions,summary 7-1 graphics overlay 7-22 graphics overlay formats 7-10 horizontal timing signals 7-7 image addressing 7-22 image definition picture 7-7 image timing 7-4 interrupts 7-23 message passing 7-23 mmio registers 7-14 ntsc 7-23 operating modes 7-13 operation,description 7-21 overlay definition picture 7-7 pal 7-23 pixel mirroring 7-11 pll filter block diagram 7-25 pll filter 7-25 progressive scan 7-6 summary of functions 7-1 timing generation description 7-6 timing register recommended values 7-23 video image data formats 7-9 yuv image format 7-9 yuv planar format 7-10 yuv upscaling 7-11 enhanced video out unit block diagram 7-3 clock system 7-3 interface pins 7-2 evo enhanced video out unit 7-1 evo_clip field description table 7-21 picture 7-20 evo_ctl field description table 7-20 picture 7-20 evo_key field description table 7-21 picture 7-20 evo_mask field description table 7-21 picture 7-20 evo_slvdly field description table 7-21 picture 7-20 exceptions definition 3-9 expansion rom base address pci interface register 11-9 f fabsval a-38 fabsvalflags a-39 fadd a-40 faddflags a-41 fdiv a-42
abc h e dfg ijklmnopqrstuvwxyz index-6 prelimin ary specification fdivflags a-43 feql a-44 feqlflags a-45 fgeq a-46 fgeqflags a-47 fgtr a-48 fgtrflags a-49 filter 5-tap 14-1 algorithm,icp horizontal 14-22 algorithm,icp vertical 14-24 coefficient,loading 14-22 horizontal 14-22 horizontal,parameter table 14-23 icp vertical 14-24 icp vertical,parameter table 14-24 parameter table,vertical 14-24 polyphase 14-1 sdram to sdram 14-24 sdram to sdram,horizontal 14-22 vertical 14-24 with rgb/yuv conversion 14-25 filtering horizontal 14-1 , 14-12 , 14-15 horizontal,icp 14-6 horizontal,method 14-11 icp 14-6 icp,5-tap 14-6 method 14-11 multi-tap 14-6 two dimensional 14-1 vertical 14-1 fleq a-50 fleqflags a-51 fles a-52 flesflags a-53 floating point exception flags 3-2 ieee rounding mode 3-2 representation 3-4 fmul a-54 fmulflags a-55 fneq a-56 fneqflags a-57 four-way lru 5-11 frame timing signals 7-7 fsign a-58 fsignflags a-59 fsqrt a-60 fsqrtflags a-61 fsub a-62 fsubflags a-63 fullres capture mode video in unit 6-1 description 6-4 funshift1 a-64 funshift2 a-65 funshift3 a-66 g general purpose registers 3-1 general purpose timer/counter 3-12 genlock 7-7 genlock mode 7-8 granularity memory 12-2 graphics overlay 7-10 , 7-22 graphics overlay formats 7-10 grid input 14-7 output 14-7 guarding definition 3-5 h h_dspiabs a-67 h_dspidualabs a-68 h_iabs a-69 h_st16d a-70 h_st32d a-71 h_st8d a-72 halfres capture mode video in unit 6-1 description 6-9 handshake mechanism jtag 18-5 hbe 8-7 header type pci interface register 11-7 hicycles a-73 hidden concurrency 5-7 hierarchical lru 5-4 highway latency audio 8-7 horizontal filtering 14-12 scaling 14-11 , 14-15 horizontal filter 14-22 parameter,table 14-23 timing 14-12 horizontal filter to rgb parameter table 14-26 horizontal filtering 14-1 , 14-15 horizontal scaling 14-1 , 14-15
abc h e dfg ijklmnopqrstuvwxyz preliminary specification index-7 horizontal timing signals 7-7 huffman code 15-1 i i/o buffer circuits 1-1 i/o operations pci interface 11-2 i2s 8-1 iabs a-74 iadd a-75 iaddi a-76 iavgonep a-77 ibytesel a-78 ic_lock_addr description table 5-13 picture 5-10 ic_lock_ctl description table 5-13 picture 5-10 ic_lock_size description table 5-13 picture 5-10 ic_params description table 5-13 picture 5-8 ic_params fields 5-8 iclear picture 3-11 iclipi a-79 iclr 5-9 , a-80 icp algorithms 14-6 alpha blending 14-9 bandwidth requirements 14-1 block diagram 14-1 chroma keying 14-9 coefficients,table 14-22 color keying 14-9 control word format 14-23 dithering 14-10 filter coefficient, loading 14-22 filter sdram to sdram 14-22 horizontal filter control word 14-27 horizontal filter parameter table 14-22 horizontal filter to rgb parameter table 14-26 horizontal filter with conversion 14-25 horizontal filter,algorithm 14-22 , 14-25 horizontal filter,table 14-23 horizontal filtering 14-6 , 14-15 horizontal scaling 14-15 image formats 14-3 image overlay formats 14-5 image overlay formats table 14-5 image resizing 14-6 image scaling 14-6 internal structure 14-1 lines mirroring 14-15 microprogram 14-16 missing pixels,filtering 14-6 move image 14-1 operation 14-16 output formats 14-5 output scaling,calculation method 14-8 overlay 14-9 parameter tables 14-22 pci block timing 14-16 pixel mirroring 14-6 priority delay 14-20 programming 14-16 registers 14-17 scaling output resolution 14-7 sdram timing 14-15 status register,pd field 14-20 upscaling example 14-7 vertical filter 14-24 vertical filter algorithm 14-24 vertical filter control word 14-25 vertical filter parameter table 14-24 vertical filtering 14-6 yuv formats 14-3 yuv sequence counter 14-15 yuv to rgb conversion 14-9 icp (image co-processor) 14-1 icp_dp, mmio register 14-17 icp_dr, mmio register 14-17 icp_mir, mmio register 14-17 icp_mpc, mmio register 14-17 icp_sr, mmio register 14-17 ident a-81 ieee 1149.1 1-1 ieee rounding mode 3-2 ieql a-82 ieqli a-83 ifir16 a-84 ifir8ii a-85 ifir8ui a-86 ifixieee a-87 ifixieeeflags a-88 ifixrz a-89 ifixrzflags a-90 iflip a-91 ifloat a-92 ifloatflags a-93 ifloatrz a-94 ifloatrzflags a-95
abc h e dfg ijklmnopqrstuvwxyz index-8 prelimin ary specification igeq a-96 igeqi a-97 igtr a-98 igtri a-99 iimm a-100 iis 8-1 ijmpf a-101 ijmpi a-102 ijmpt a-103 ild16 a-104 ild16d a-105 ild16r a-106 ild16x a-107 ild8 a-108 ild8d a-109 ild8r a-110 ileq a-111 ileqi a-112 iles a-113 ilesi a-114 image icp input format 14-3 processing algorithms 14-6 resizing 14-6 scaling 14-6 scaling factor range 14-3 size range 14-3 image co-processor block diagram 14-1 image co-processor 14-1 block diagram 14-2 image formats 14-3 image overlay 14-1 , 14-5 , 14-9 image overlay formats of icp,table 14-5 image processing bandwidth 14-1 imask picture 3-11 imax a-115 imin a-116 imul a-117 imulm a-118 ineg a-119 ineq a-120 ineqi a-121 initialization dram memory system 12-6 instruction cache 5-10 initialization,cache 5-8 inonzero a-122 input format icp 14-3 input grid relating to output grid 14-7 instruction breakpoint 3-13 instruction cache 5-8 address mapping 5-8 picture 5-9 coherency 5-11 initialization and boot 5-10 lru replacement 5-11 performance evaluation support 5-12 instruction cache parameters 5-8 instruction cache set 5-8 instruction cache tag 5-8 instruction cache,summary 5-8 int_ctl pci interface mmio register 11-15 picture 3-12 , 11-10 integer representation 3-4 interleaving of sdram 12-6 interrupt line pci interface register 11-9 interrupt mask 3-10 interrupt mode 3-10 interrupt pin pci interface register 11-9 interrupt priority 3-10 interrupt vectors 3-9 interrupts 3-9 definition 3-9 dspcpu enable bit 3-2 interspersed sampling 6-5 intervals refresh 12-6 intvec[31:0] picture 3-9 io_adr pci interface mmio register 11-13 picture 11-10 io_ctl pci interface mmio register 11-13 picture 11-10 io_data pci interface mmio register 11-13 picture 11-10 ipending picture 3-11 is 11172-2 references 15-3 is 13818-2 references table 15-3 isetting0 picture 3-10 isetting1
abc h e dfg ijklmnopqrstuvwxyz preliminary specification index-9 picture 3-10 isetting2 picture 3-10 isetting3 picture 3-10 isub a-123 isubi a-124 izero a-125 j jmpf a-126 jmpi a-127 jmpt a-128 jtag additional registers picture 18-4 bypass instruction 18-2 communication protocol 18-5 example datat transfer 18-5 extest instruction 18-2 instruction encodings table 18-2 instructions sel_data_in 18-5 sel_data_out 18-5 sel_ifull_in 18-5 sel_jtag_ctrl 18-5 sel_ofull_out 18-5 macro instruction 18-3 mmio registers table 18-4 overview 18-1 race condition,avoid 18-5 reset instruction 18-2 sample/preload instruction 18-2 sel_data_in instruction 18-2 sel_data_out instruction 18-3 sel_ifull_in instruction 18-3 sel_jtag_ctrl instruction 18-3 sel_ofull_out instruction 18-3 system components 18-3 tap controller description 18-1 tap controller state diagram,picture 18-2 test access port 18-1 test clock 18-1 , 18-3 test data in 18-1 test data out 18-1 test mode select 18-1 virtual registers 18-4 jtag_ctrl register 18-4 jtag_data_in register 18-4 jtag_data_out register 18-4 jtag_ifull_in 18-4 jtag_ofull_out 18-4 k keying chroma 14-9 color 14-9 l latency timer pci interface register 11-7 latency,memory operation 5-8 ld32 a-129 ld32d a-130 ld32r a-131 ld32x a-132 level sensitive interrupts 3-10 lines mirroring 14-15 load coefficients parameter table 14-22 load store ordering 3-3 , 3-5 , 3-7 , 5-5 , 17-4 , 17-6 locking conditions 5-4 locking range 5-4 lru bit definition 5-12 lru bit definitions,picture 5-12 lru bit update ordering 5-12 lru initialization 5-12 lru replacement,cache 5-11 lru, hierarchical 5-4 lru,four-way 5-11 lru,two-way 5-11 lsl a-133 lsli a-134 lsr a-135 lsri a-136 m macro block header 15-1 macroblock header, standard references 15-3 main image 14-9 max_lat pci interface register 11-9 maximum ratings 1-12 mem_events description table 5-13 picture 5-12
abc h e dfg ijklmnopqrstuvwxyz index-10 preliminary specification memory operation ordering 5-7 memory data formats audio in unit 8-4 audio out unit 9-7 memory format audio 8-4 memory hole 5-5 memory map 3-7 picture 3-7 memory mapped devices 3-7 mergelsb a-138 mergemsb a-139 message passing mode video in unit description 6-11 message-passing mode video in unit 6-1 description 6-11 min_gnt pci interface register 11-9 mirroring lines 14-15 pixels 14-12 misaligned store 3-3 miss processing,order 5-9 mm_a[11:0] description table 12-5 mm_cas# description table 12-5 mm_cke[3:0] description table 12-5 mm_clk[1:0] description table 12-5 mm_cs#[3:0] description table 12-5 mm_dq[31:0] description table 12-5 mm_dqm description table 12-5 mm_ras# description table 12-5 mm_we# description table 12-5 mmio 3-7 mmio aperture picture 3-8 mmio references,non-cached 5-8 mmio registers ai_base1 picture 8-5 ai_base2 picture 8-5 ai_control field description table 8-6 ai_ctl picture 8-5 ai_framing picture 8-5 ai_freq picture 8-5 ai_serial picture 8-5 ai_size picture 8-5 ai_status field description table 8-6 picture 8-5 ao_base1 picture 9-8 ao_base2 picture 9-8 ao_cc picture 9-8 ao_cfc picture 9-8 ao_control field description table 9-9 , 9-10 ao_ctl picture 9-8 ao_framing picture 9-8 ao_freq picture 9-8 ao_serial picture 9-8 ao_size picture 9-8 ao_status field description table 9-9 picture 9-8 , 16-2 bdataahigh picture 3-14 bdataalow picture 3-14 bdatamask picture 3-14 bdataval picture 3-14 bdctl picture 3-14 bictl picture 3-14 binsthigh picture 3-14
abc h e dfg ijklmnopqrstuvwxyz preliminary specif ication index-11 binstlow picture 3-14 biu_ctl 11-11 picture 11-10 biu_status 11-11 picture 11-10 cache registers summary 5-13 config_adr 11-12 picture 11-10 config_ctl 11-13 picture 11-10 config_data 11-13 dc_lock_addr description table 5-13 picture 5-5 dc_lock_ctl description table 5-13 picture 5-5 dc_lock_size description table 5-13 picture 5-5 dc_params 5-3 description table 5-13 fields 5-3 picture 5-3 dest_adr 11-14 picture 11-10 dma_ctl 11-14 picture 11-10 dram_base 11-9 description table 5-13 picture 5-2 , 11-10 dram_cacheable_limit description table 5-13 picture 5-5 dram_limit description table 5-13 picture 5-2 evo_clip picture 7-20 evo_ctl picture 7-20 evo_key picture 7-20 evo_maskk picture 7-20 evo_slvdly picture 7-20 for vld 15-4 ic_lock_addr description table 5-13 picture 5-10 ic_lock_ctl description table 5-13 picture 5-10 ic_lock_size description table 5-13 picture 5-10 ic_params description table 5-13 fields 5-8 picture 5-8 iclear picture 3-11 icp_dp 14-17 icp_dr 14-17 icp_mir 14-17 icp_mpc 14-17 icp_sr 14-17 imask picture 3-11 int_ctl 11-15 picture 3-12 , 11-10 intvec[31:0] picture 3-9 io_adr 11-13 picture 11-10 io_ctl 11-13 picture 11-10 io_data 11-13 picture 11-10 ipending picture 3-11 isetting0 picture 3-10 isetting1 picture 3-10 isetting2 picture 3-10 isetting3 picture 3-10 jtag registers 18-4 jtag_ctrl 18-4 jtag_data_in 18-4 jtag_data_out 18-4 mem_events description table 5-13 picture 5-12 mm_config picture 12-4 mmio_base 11-9 description table 5-13 picture 11-10 of enhanced video out unit 7-14 of icp 14-17 pci interface
abc h e dfg ijklmnopqrstuvwxyz index-12 preliminary specification accessibility 11-11 pci_adr 11-12 picture 11-10 pci_data 11-12 picture 11-10 pll_ratios picture 12-4 scr_adr picture 11-10 setup of ssi_ctl 17-6 spdo_base1 picture 10-5 spdo_base2 picture 10-5 spdo_ctl picture 10-5 spdo_freq picture 10-5 spdo_size picture 10-5 spdo_status picture 10-5 spdo_tstamp picture 10-5 src_adr 11-14 ssi_csr fields description 17-11 ssi_ctl fields description 17-9 summary table b-1 tctl picture 3-13 tmodulus picture 3-13 tvalue picture 3-13 vi_base1 alignment 6-11 picture 6-10 vi_base2 alignment 6-11 picture 6-10 vi_cap_size picture 6-8 vi_cap_start picture 6-8 vi_clock picture 6-8 , 6-10 vi_ctl picture 6-8 , 6-10 vi_size picture 6-10 vi_status picture 6-8 , 6-10 vi_u_base_adr picture 6-8 vi_uv_delta picture 6-8 vi_v_base_adr picture 6-8 vi_y_base_adr picture 6-8 vi_y_delta picture 6-8 video in, view in raw and message passing mode picture 6-10 video in,yuv capture 6-8 vld unit,picture 15-6 vo_clock common values 7-23 picture 7-15 vo_ctl fields description table 7-17 picture 7-15 vo_field default values 7-23 picture 7-15 vo_frame default values 7-23 picture 7-15 vo_image default values 7-23 picture 7-15 vo_line default values 7-23 picture 7-15 vo_oladd field description table 7-19 picture 7-15 vo_olhw picture 7-15 vo_olstart picture 7-15 vo_status picture 7-15 vo_uadd field description table 7-19 picture 7-15 vo_vadd field description table 7-19 picture 7-15 vo_vuf picture 7-15 vo_yadd picture 7-15 vo_yolf
abc h e dfg ijklmnopqrstuvwxyz preliminary specif ication index-13 field description table 7-19 picture 7-15 vo_ythr picture 7-15 vo_yuf field description table 7-19 mmio_base description table 5-13 pci interface mmio register 11-9 pci interface register 11-7 picture 11-10 mmio_base updates 11-10 mpeg bitstream 15-1 mpeg-1 macroblock header 15-3 mpeg-1 macroblock header,output format 15-4 mpeg-1 standard references 15-3 mpeg-2 macroblock header 15-3 mpeg-2 macroblock header,output format 15-2 mpeg-2 standard references table 15-3 multi-tap fir filtering 14-6 n new features 1-1 non cacheable region 5-5 noncachable region 5-3 non-interlaced scan 7-6 non-maskable interrupt 3-10 nop a-140 ntsc 7-23 o offset byte in set 5-8 operation ordering,special 5-7 operations dspcpu a-1 , a-2 order,miss processing 5-9 ordering memory operations 5-7 ordering information 1-10 ordering,special operation 5-7 output formats icp 14-5 output grid relating to input grid 14-7 output scaling calculation 14-8 overlap configuration of windows 14-1 overlay blending 14-9 of image 14-1 overlay formats of icp 14-5 overlay image 14-9 overlay, image 14-5 , 14-9 overlays computer generated 14-9 oversampling a/d converter 8-2 p pack16lsb a-141 pack16msb a-142 package outline 1-10 package,bga package 1-10 packbytes a-143 pal 7-23 parameter table icp horizontal filter 14-23 parameter tables horizontal filter to rgb 14-26 icp 14-22 vertical filter 14-24 part number 1-10 partial words 5-4 pci aperture 11-2 output block timing 14-16 space 11-2 pci aperture 5-5 pci configuration space 11-3 pci header 11-3 pci interface characteristics overview 11-1 concurrency 11-3 configuration header 11-3 configuration operations 11-2 configuration registers 11-3 dma operations 11-2 i/o operations 11-2 initiator 11-2 limitations 11-17 ordering 11-3 priorities 11-3 registers base addresses 11-7 built-in self test 11-7 cache line size 11-7 class code 11-6 command fields 11-5
abc h e dfg ijklmnopqrstuvwxyz index-14 preliminary specification command id 11-3 device id 11-3 dram_base 11-7 expansion rom base address 11-9 header type 11-7 interrupt line 11-9 interrupt pin 11-9 latency timer 11-7 max_lat 11-9 min_gnt 11-9 mmio_base 11-7 revision id 11-6 status 11-5 fields 11-6 vendor id 11-3 single word load/store 11-2 target of operations 11-3 pci references,non-cached 5-8 pci_adr pci interface mmio register 11-12 picture 11-10 pci_data pci interface mmio register 11-12 picture 11-10 pcsw definition 3-2 performance events,cache 5-13 philips part number 1-10 pins ai_osclk description table 8-1 ai_sck description table 8-1 ai_sd description table 8-1 ai_ws description table 8-1 ao_osclk description table 9-2 ao_sck description table 9-2 complete list 1-2 dc/ac characteristics 1-12 i/o circuit summary 1-1 mm_cas# description table 12-5 mm_clk[1:0] description table 12-5 mm_cs#[3:0] description table 12-5 mm_dq[31:0] description table 12-5 mm_dqm description table 12-5 mm_ras# description table 12-5 mm_we# description table 12-5 package 1-10 spdo description table 10-1 timing 1-19 , 1-20 , 1-21 vi_clk description table 6-2 vi_data[7:0] description table 6-2 vi_data[8] 6-11 vi_data[9:8] description table 6-2 vi_data[9] 6-11 vi_dvalid description table 6-2 vo_clk description table 7-3 vo_data[7:0] description table 7-3 vo_io1 description table 7-3 vo_io2 description table 7-3 pixel mirroring 14-6 missing 14-6 shift bypassing for downscaling 14-8 transformation,scaling 14-7 pixel mirroring 7-11 pixels mirroring 14-12 planar data format 14-3 pll filter of video out 7-25 polyphase filter 14-1 power down mode dram memory system 12-7 of sdram 12-7 pref a-144 pref16x a-145 pref32x a-146 prefd a-147 prefr a-148 priority delay 14-20 progressive scan 7-6
abc h e dfg ijklmnopqrstuvwxyz preliminary specif ication index-15 q quadavg a-149 , a-150 quadumulmsb a-151 , a-152 quasi-dual 5-4 r rank size vs. address mapping 12-5 , 12-6 raw capture modes video in unit description 6-10 raw10s capture mode video in unit 6-1 raw10u capture mode video in unit 6-1 raw8 capture mode video in unit 6-1 rdstatus a-153 result format 5-6 rdstatus operation 5-6 result format picture 5-6 rdtag a-154 result format 5-6 rdtag operation 5-6 result format picture 5-6 readdpc a-155 readpcsw a-156 readspc a-157 refresh dram memory system 12-6 intervals 12-6 region noncachable 5-3 region,non-cacheable 5-5 register model 3-1 , 4-1 replacement 5-4 representation boolean 3-3 floating point 3-4 integer 3-4 rescaling of images 14-1 resizing horizontal 14-1 in icp 14-6 vertical 14-1 revision id pci register 11-6 rgb conversion 14-1 rol a-158 roli a-159 run-level output data 15-1 s sample rate 8-1 , 8-2 sav and eav codes description 7-5 description table 7-6 format picture 7-5 sav format 6-5 scaling 14-6 algorithm 14-8 horizontal 14-1 , 14-11 , 14-15 horizontal,method 14-11 method 14-11 range 14-3 shift bypassing 14-8 two dimensional 14-1 vertical 14-1 , 14-13 sdram 12-2 supported devices 12-2 , 13-7 sdram memory system timing budget 12-8 sequence counter yuv 14-15 serial ccir656 7-2 serial frame 8-1 , 8-3 serial interface 17-1 sex16 a-160 sex8 a-161 sgram 12-2 supported devices 12-2 , 13-7 size of image,range 14-3 software compatibility 3-4 software interrupt 3-11 spc definition 3-3 spdo description table 10-1 spdo_base1 picture 10-5 spdo_base2 picture 10-5 spdo_ctl picture 10-5 spdo_freq picture 10-5 spdo_size picture 10-5 spdo_status picture 10-5 spdo_tstamp picture 10-5
abc h e dfg ijklmnopqrstuvwxyz index-16 preliminary specification speculative loads 3-3 , 3-5 , 3-7 , 5-5 , 17-4 , 17-6 src_adr pci interface mmio register 11-14 picture 11-10 ssi_ctl field description 17-9 st16 a-162 st16d a-163 st32 a-164 st32d a-165 st8 a-166 st8d a-167 stall,cpu 5-8 status pci interface register 11-5 status operations,cache 5-6 , 5-7 stereo 8-1 stereo a/d converter 8-1 store misaligned 3-3 subsampling horizontal 14-1 vertical 14-1 synchronous serial interface 17-1 synthesizer 8-2 synthesizer,digital 7-3 t tag operations 5-6 , 5-7 tap controller 18-1 description 18-1 tap,test access port 18-1 tctl picture 3-13 termination guidelines 12-7 test access port 18-1 tfe definition 3-3 timer 3-12 timing 1-19 sdram block 14-15 vertical filter 14-15 timing reference codes 6-5 tmodulus picture 3-13 translucent background 14-9 foreground 14-9 tvalue picture 3-13 two-way lru 5-11 u ubytesel a-168 uclipi a-169 uclipu a-170 ueql a-171 ueqli a-172 ufir16 a-173 ufir8uu a-174 ufixieee a-175 ufixieeeflags a-176 ufixrz a-177 ufixrzflags a-178 ufloat a-179 ufloatflags a-180 ufloatrz a-181 ufloatrzflags a-182 ugeq a-183 ugeqi a-184 , a-186 ugtr a-185 uimm a-187 uld16 a-188 uld16d a-189 uld16r a-190 uld16x a-191 uld8 a-192 uld8d a-193 uld8r a-194 uleq a-195 uleqi a-196 ules a-197 ulesi a-198 ume8ii a-199 ume8uu a-200 umul a-202 umulm a-203 uneq a-204 uneqi a-205 upsampling horizontal 14-1 vertical 14-1 upscaling 7-11 , 14-1 v v.34 interface block diagram 17-2 , 17-3 , 17-4 external pins,table 17-1 programming model 17-8 setup of ssi_ctl register 17-5 test modes 17-8 transmitter logic model 17-5 used as general purpose i/o
abc h e dfg ijklmnopqrstuvwxyz preliminary specif ication index-17 17-1 , 17-2 , 17-3 v.34 modem 17-1 vectored interrupts 3-9 vendor id pci interface register 11-3 vertical filter icp 14-24 vertical filter parameter table 14-24 vertical filtering 14-1 vertical scaling 14-1 , 14-13 vi_base1 alignment 6-11 picture 6-10 vi_base2 alignment 6-11 picture 6-10 vi_cap_size picture 6-8 vi_cap_start picture 6-8 vi_clk description table 6-2 vi_clock picture 6-8 , 6-10 vi_ctl picture 6-8 , 6-10 vi_data vi_data[8] 6-11 vi_data[9] 6-11 vi_data[7:0] description table 6-2 vi_data[9:8] description table 6-2 vi_dvalid description table 6-2 vi_size picture 6-10 vi_status picture 6-8 , 6-10 vi_u_base_adr picture 6-8 vi_uv_delta picture 6-8 vi_v_base_adr picture 6-8 vi_y_base_adr picture 6-8 vi_y_delta picture 6-8 victim of replacement 5-4 video image data formats 7-9 video in unit capture parameters explanation 6-6 picture 6-5 clock generator 6-4 clocking modes 6-4 common source parameters 6-6 connected to 10bit a/d converter picture 6-4 connected to 8bit ccir656 camera picture 6-3 connected to video out picture 6-3 connected to video recorder picture 6-3 co-sited sampling 6-4 diagnostic mode 6-2 format of sav and eav codes 6-5 fullres capture mode 6-1 description 6-4 halfres capture mode 6-1 description 6-9 halfres co-sited sample capture picture 6-9 halfres interspersed sample capture picture 6-9 halfres planar memory format picture 6-9 highway latency requirements 6-13 highway latency,hbe description 6-13 interface pins description table 6-2 interspersed sampling 6-5 message passing major states diagram 6-12 message passing mode description 6-11 example signal diagram 6-12 message-passing mode 6-1 description 6-11 power down 6-2 raw and message passing modes mmio register view, picture 6-10 raw capture modes description 6-10 raw mode,major states,diagram 6-11 raw10s capture mode 6-1 raw10u capture mode 6-1 raw8 capture mode 6-1 reset 6-2 yuv 4:2:2 planar memory format picture 6-7 yuv capture view of mmio registers 6-8 virtual registers 18-4 vld
abc h e dfg ijklmnopqrstuvwxyz index-18 preliminary specification command register 15-1 command register,description 15-7 commands 15-1 cpu interaction 15-2 error handling,description 15-8 flush output command 15-1 input,description 15-2 interrupt description 15-8 introduction 15-1 mmio registers 15-4 picture 15-6 operational registers,description 15-7 output,description 15-3 parse command 15-1 parsing action 15-2 picture info register,description 15-8 quantizer scale register,description 15-7 reset command 15-1 reset description 15-8 search command 15-1 shift command 15-1 shift register,description 15-7 software reset procedure 15-8 stop reasons 15-1 vo video out unit 7-1 vo_clk description table 7-3 vo_clock common values 7-23 field description table 7-18 picture 7-15 vo_ctl fields 7-17 picture 7-15 vo_data[7:0] description table 7-3 vo_field default values 7-23 field description table 7-18 picture 7-15 vo_frame default values 7-23 field description table 7-18 picture 7-15 vo_image default values 7-23 field description table 7-19 picture 7-15 vo_io1 description table 7-3 vo_io2 description table 7-3 vo_line default values 7-23 field description table 7-19 picture 7-15 vo_oladd field description table 7-19 picture 7-15 vo_olhw field description table 7-19 picture 7-15 vo_olstart field description table 7-19 picture 7-15 vo_status field description table 7-16 picture 7-15 vo_uadd field description table 7-19 picture 7-15 vo_vadd field description table 7-19 picture 7-15 vo_vuf picture 7-15 vo_yadd field description table 7-19 picture 7-15 vo_yolf field description table 7-19 picture 7-15 vo_ythr field description table 7-7 , 7-19 picture 7-15 vo_yuf field description table 7-19 w write misses 5-4 writedpc a-206 writepcsw a-207 writespc a-208 y yuv formats of icp 14-3 sequence counter 14-15 yuv capture view of video in mmio registers 6-8 yuv conversion 14-1 yuv image format 7-9
abc h e dfg ijklmnopqrstuvwxyz preliminary specif ication index-19 yuv planar format 7-10 yuv to rgb conversion 14-9 yuv to rgb converter 14-1 yuv upscaling 7-11 z zex16 a-209 zex8 a-210
abc h e dfg ijklmnopqrstuvwxyz index-20 preliminary specification
? philips electronics n.v. sca all rights are reserved. reproduction in whole or in part is prohibited without the prior written consent of the copyright owne r. the information presented in this document does not form part of any quotation or contract, is believed to be accurate and reli able and may be changed without notice. no liability will be accepted by the publisher for any consequence of its use. publication thereof does not con vey nor imply any license under patent- or other industrial or intellectual property rights. internet: http://www.semiconductors.philips.com 2004 69 printed in the united states of america philips semiconductors ? a worldwide company for all other countries apply to: philips semiconductors, international marketing & sales communications, building be-p, p.o. box 218, 5600 md eindhoven, the netherlands, fax. +31 40 27 24825 argentina: see south america australia: 3 figtree drive, homebush, nsw 2140, tel. +61 2 9704 8141, fax. +61 2 9704 8139 austria: computerstr. 6, a-1101 wien, p.o. box 213, tel. +43 1 60 101 1248, fax. +43 1 60 101 1210 belarus: hotel minsk business center, bld. 3, r. 1211, volodarski str. 6, 220050 minsk, tel. +375 172 20 0733, fax. +375 172 20 0773 belgium: see the netherlands brazil: see south america bulgaria: philips bulgaria ltd., energoproject, 15th floor, 51 james bourchier blvd., 1407 sofia, tel. +359 2 68 9211, fax. +359 2 68 9102 canada: philips semiconductors/components, tel. +1 800 234 7381, fax. +1 800 943 0087 china/hong kong: 501 hong kong industrial technology centre, 72 tat chee avenue, kowloon tong, hong kong, tel. +852 2319 7888, fax. +852 2319 7700 colombia: see south america czech republic: see austria denmark: sydhavnsgade 23, 1780 copenhagen v, tel.+4533293333,fax.+4533293905 finland: sinikalliontie 3, fin-02630 espoo, tel. +358 9 615 800, fax. +358 9 6158 0920 france: 51 rue carnot, bp317, 92156 suresnes cedex, tel. +33 1 4099 6161, fax. +33 1 4099 6427 germany: hammerbrookstra?e 69, d-20097 hamburg, tel.+4940235360,fax.+494023536300 hungary: see austria india: philips india ltd, band box building, 2nd floor, 254-d, dr. annie besant road, worli, mumbai 400 025, tel. +91 22 493 8541, fax. +91 22 493 0966 indonesia: pt philips development corporation, semiconductors division, gedung philips, jl. buncit raya kav.99-100, jakarta 12510, tel. +62 21 794 0040 ext. 2501, fax. +62 21 794 0080 ireland: newstead, clonskeagh, dublin 14, tel.+35317640000,fax.+35317640200 israel: rapac electronics, 7 kehilat saloniki st, po box 18053, tel aviv 61180, tel. +972 3 645 0444, fax. +972 3 649 1007 italy: philips semiconductors, via casati, 23 - 20052 monza (mi), tel. +39 039 203 6838, fax +39 039 203 6800 japan: philips bldg 13-37, kohnan 2-chome, minato-ku, tokyo 108- 8507, tel. +81 3 3740 5130, fax. +81 3 3740 5057 korea: philips house, 260-199 itaewon-dong, yongsan-ku, seoul, tel. +82 2 709 1412, fax. +82 2 709 1415 malaysia: no. 76 jalan universiti, 46200 petaling jaya, selangor, tel. +60 3 750 5214, fax. +60 3 757 4880 mexico: 5900 gateway east, suite 200, el paso, texas 79905, tel. +9- 5 800 234 7381, fax +9-5 800 943 0087 middle east: see italy netherlands: postbus 90050, 5600 pb eindhoven, bldg. vb, tel.+31402782785,fax.+31402788399 new zealand: 2 wagener place, c.p.o. box 1041, auckland, tel. +64 9 849 4160, fax. +64 9 849 7811 norway: box 1, manglerud 0612, oslo, tel.+4722748000,fax.+4722748341 pakistan: see singapore philippines: philips semiconductors philippines inc., 106 valero st. salcedo village, p.o. box 2108 mcc, makati, metromanila, tel.+6328166380,fax.+6328173474 poland : al.jerozolimskie 195 b, 02-222 warsaw, tel.+48225710000,fax.+48225710001 portugal: see spain romania: see italy russia: philips russia, ul. usatcheva 35a, 119048 moscow, tel. +7 095 755 6918, fax. +7 095 755 6919 singapore: lorong 1, toa payoh, singapore 319762, tel. +65 350 2538, fax. +65 251 6500 slovakia: see austria slovenia: see italy south africa: s.a. philips pty ltd., 195-215 main road martindale, 2092 johannesburg, p.o. box 58088 newville 2114, tel. +27 11 471 5401, fax. +27 11 471 5398 south america: al. vicente pinzon, 173, 6th floor, 04547- 130 s?o paulo, sp, brazil, tel. +55 11 821 2333, fax. +55 11 821 2382 spain: balmes 22, 08007 barcelona, tel. +34 93 301 6312, fax. +34 93 301 4107 sweden: kottbygatan 7, akalla, s-16485 stockholm, tel. +46 8 5985 2000, fax. +46 8 5985 2745 switzerland: allmendstrasse 140, ch-8027 zrich, tel. +41 1 488 2741 fax. +41 1 488 3263 taiwan: philips semiconductors, 6f, no. 96, chien kuo n. rd., sec. 1, taipei, taiwan tel.+886221342886,fax.+886221342874 thailand: philips electronics (thailand) ltd., 209/2 sanpavuth- bangna road prakanong, bangkok 10260, tel. +66 2 745 4090, fax. +66 2 398 0793 turkey: yukari dudullu, org. san. blg., 2.cad. nr. 28 81260 umraniye, istanbul, tel. +90 216 522 1500, fax. +90 216 522 1813 ukraine : philips ukraine, 4 patrice lumumba str., building b, floor 7, 252042 kiev, tel. +380 44 264 2776, fax. +380 44 268 0461 united kingdom: philips semiconductors ltd., 276 bath road, hayes, middlesex ub3 5bx, tel. +44 208 730 5000, fax. +44 208 754 8421 united states: 811 east arques avenue, sunnyvale, ca 94088-3409, tel. +1 800 234 7381, fax. +1 800 943 0087 uruguay: see south america vietnam: see singapore yugoslavia: philips, trg n. pasica 5/v, 11000 beograd, tel. +381 11 3341 299, fax.+381 11 3342 553 date of release: 2004 aug 20 document order number: xxxx xxx xxxxx
2004 aug 20 philips semiconductors product specification media processor pnx1300/01/02/11

▲Up To Search▲

Price & Availability of SAA7115HLV1518

	To Download SAA7115HLV1518 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .